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An Introduction to the Statistical Drake Equation | 


i. Introduction 


SETI (an acronym for “Search for Extraterrestrial Intelligence”) is a relatively 
new branch of scientific research, having begun only in 1959. Its goal is to 
ascertain whether alien civilizations exist in the universe, how far from us 
they exist, and possibly how much more advanced than us they may be. 


As of 2009, the only physical tools we know that could help us get in touch 
with aliens are the electromagnetic waves an alien civilization could emit and 
we could detect. This forces us to use the largest radiotelescopes on Earth for 
SETI research, because the higher our collecting area of electromagnetic | 
radiation is, the higher our sensitivity is (that is, the farther in space we can 
probe). Yet, even by using the largest radiotelescopes on Earth (the 310-meter 
dish at Arecibo, for instance), we cannot search for aliens beyond, say, a few 
hundred light years away. This is a very, very small amount of space around us 
within our galaxy, the Milky Way, that is about 100,000 light years in diameter. 
Thus, current SETI can cover only a very tiny fraction of the galaxy, and it is — 
not surprising that in the past 50 years of SETI searches, NO extraterrestrial 
civilization was discovered. Quite simply, we did not get far encugh! 


This demands the construction of much more powerful and radically new 
radiotelescopes. Rather than big and heavy metal dishes, whose mechanical 
problems hamper SETI research too much, we are now turning to “software | 
radictelescopes,” where a large number of small dishes (ATA = Allen 
Telescope Array, and ALMA = Atacama Large Millimeter/submillimeter Array) 
or even just of simple dipoles (LOFAR = Low Frequency Array) using state-of- 
the-art electronics and very-high-speed computing can outperform the 
classical radiotelescopes in many regards. The final dream in this field is the 
SKA (= Square Kilometer Array), currently being designed and expected to be 
completed around 2020. 


2. The Key Question: How Far are They ? 
But still, the key question remains: how far are they? 


Or, more correctly, how far do we expect the NEAREST extraterrestrial civilization to be 
from the Solar System in the galaxy? 


This question was first faced in a scientific manner back in 1961 by the same scientist 
who also was the first experimental SETI radio astronomer ever: the American, Frank 
Donald Drake (born 1930). He first considered the shape and size of the galaxy where 
we are living: the Milky Way. This is a spiral galaxy measuring some 100,000 light 
years in diameter and some 16,000 light years in thickness of the Galactic Disk at half- 
way from its center. That is: 


The diameter of the galaxy is (about) 100,000 light years, (abbreviated ly) i.e., its 
radius, Roane iS about 50,000 ly. 
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The thickness of the Galactic Disk at half-way from its center, A,,,,,.,, iS about 16,000 ly. 


The volume of the galaxy may then be approximated as the volume of the 
corresponding cylinder, i.e. 


¥, = Raia h. (1) 


Galaxy 


Now consider the sphere around us having a radius r. The volume of such a sphere is 
a 3 
4 “( arenes | (2) 


Vour, Sphere = = 
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In the last equation, we had to divide the distance “ET_Distance” between ourselves 
and the nearest ET civilization by 2 because we are now going to make the 
unwarranted assumption that aff ET civilizations are equally spaced from each 
other in the galaxy! This is a crazy assumption, clearly, and should be replaced by 
more scientifically-grounded assumptions as soon as we know more about our Galactic 
Neighborhood. At the moment, however, this is the best guess that we can make, and 
so we shall take it for granted, although we are aware that this is a weak point in the 
reasoning. 


Furthermore, let us denote by WV the total number of civilizations now living in the 
galaxy, including ourselves. Of course, this number WV is unknown. We only know that 
N21 since one civilization does at least exist! 


Having thus assumed that ET civilizations are UNIFORMLY SPACED IN THE GALAXY, we 
can then write down the proportion: 


Vi; ray Vour Sphere 
GE ee. (3) 
N l 


That is, upon replacing both (1) and (2) into (3): 


: os ET_Distance 
® RGatay 3 2 } 

a ee (4) 
N 1 H 

The last equation contains two unknowns: W and ET_Distance, and so we don’t know 
which one it is better to solve for. 


However, we may suppose that, by resorting to the (rather uncertain) knowledge that 
we have about the Evolution of the galaxy through the last 10 billion years or so, we 
might somehow compute an approximate value for NV. 


Then, we may solve (4) for ET_Distance thus obtaining the (AVERAGE) DISTANCE 
BETWEEN ANY PAIR OF NEIGHBORING CIVILIZATIONS IN THE GALAXY (DISTANCE 
LAW) 
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ET Di (N) 3 6 Re vtarvlt C (5) 
stance ) = ——_—-——- = ——= 
ayy Ny 
where the positive constant C is defined by 
C= ¥6 Revatesy UGatay © 28845 light years . (6) 


Equations (5) and (6) are the starting point to understand the origin of the Drake 
equation that we discuss in detail in Section 3 of this paper. 


Let us just complete this section by pointing out three different numerical cases of the 
distance law (5): 


*« We know that we exist, so VW may not be smaller than 1, i.e., N21. Suppose then 
that we are alone in the galaxy, i.e., that V=1. Then the distance law (5) yields as 
distance to the nearest civilization from us just the constant C, i.e., 28,845 light 
years. This is about the distance in between ourselves and the center of the galaxy 
(i.e, the Galactic Bulge). Thus, this result seems to suggest that, if we do not find 
any extraterrestrial civilization around us in these outskirts of the galaxy where we 
live, we should look around the Galactic Center first. And this is indeed what is 
happening, i.e., many SETI searches are actually pointing the antennas towards the 
Galactic Center, looking for beacons (see, for instance ref. [1]). 


e Suppose next that N=1000, i.e. there are about a thousand extraterrestrial 
communicating civilizations in the whole galaxy right now. Then the distance law (5) 
yields an average distance of 2,885 light years. This is a distance that most 
radiotelescopes in Earth may not reach for SETI searches right now: hence the need 
to build larger radiotelescopes, like ALMA, LOFAR and the SKA. 


e Suppose finally that V=1000000, i.e., there are a million communicating civilizations 
now in the galaxy. Then the distance law (5) yields an average distance of 288 light 
years. This is within the (upper) range of distances that our current radiotelescopes 
may reach for SETI searches, and that justifies all SETI searches that have been 
done so far in the first fifty years of SETI (1960-2010). 


In conclusion, interpolating the above three special cases of V, we may say that the 


distance law (5) yields the following key diagram of the average ET distance vs. the 
assumed number of communicating civilizations, NV, in the galaxy right now (Figure 1): 
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Average DISTANCEof the nearest ET civilization vs. the ASSUMED NUMBER of ET civilizations in the Gak 
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Figure 1. DISTANCE LAW; i.e., the Average Distance (plot along the vertical axis in light years) Versus 
the NUMBER of Communicating Civilizations ASSUMED to Exist in the Galaxy Right Now 


3. Computing WN By Virtue of the Drake Equation (1961) 


In the previous section, the problem of finding how close the nearest ET civilization may 
be was “solved” by reducing it to the computation of N, the total number of 
extraterrestrial civilizations now existing in this galaxy. In this section the famous 
Drake equation is described, that was propased back in 1961 by Frank Donald Drake 
(born 1930) to estimate the numerical value of WV. We believe that no better 
introductory description of the Drake equations exists other than the one given by Carl 
Sagan in his 1983 book “Cosmos” (ref. [2]), in its turn based on the famous TV series 
“Cosmos.” So, in this paragraph we report Carl Sagan’s description of the Drake 
equation unabridged. 


“But is there anyone out there to talk to? With a third or a half a trillion stars in our 
Milky Way galaxy alone, could ours be the only one accompanied by an inhabited 
planet? How much more likely it is that technical civilizations are a cosmic 
commonplace, that the galaxy is pulsing and humming with advanced societies, and, 
therefore, that the nearest such culture is not so very far away — perhaps transmitting 
from antennas established on a planet of a naked-eye star just next door. Perhaps 
when we look up at the sky at night, near one of those faint pinpoints of light is a world 
on which someone quite different from us is then glancing idly at a star we call the Sun 
and entertaining, for just a moment, an outrageous speculation. 
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It is very hard to be sure. There may be several impediments to the evolution of a 
technical civilization. Planets may be rarer than we think. Perhaps the origin of life is 
not so easy as our laboratory experiments suggest. Perhaps the evolution of advanced 
life forms is improbable. Or it may be that complex life forms evolve more readily, but 
intelligence and technical societies require an unlikely set of coincidences — just as the 
evolution of the human species depended on the demise of the dinosaurs and the ice- 
age recession of the forests in whose trees our ancestors screeched and dimly 
wondered. Or perhaps civilizations arise repeatedly, inexorably, on innumerable planets 
in the Milky Way, but are generally unstable; so all but a tiny fraction are unable to 
survive their technology and succumb to greed and ignorance, pollution and nuclear 
war. 


It is possible to explore this great issue further and make a crude estimate of N, the 
number of advanced civilizations in the galaxy. We define an advanced civilization as 
one capable of radio astronomy. This is, of course, a parochial if essential definition. 
There may be countless worlds on which the inhabitants are accomplished linguists or 
superb poets but indifferent radio astronomers. We will not hear from them. N can be 
written as the product or multiplication of a number of factors, each a kind of filter, 
every one of which must be sizable for there to be a large number of civilizations: 


e Ns, the number of stars in the Milky Way galaxy. 

e fp, the fraction of stars that have planetary systems. 

e ne, the number of planets in a given system that are ecologically suitable for life. 
e ff, the fraction of otherwise suitable planets on which life actually arises. 

e ff, the fraction of inhabited planets on which an intelligent form of life evolves. 


° fc, the fraction of planets inhabited by intelligent beings on which a communicative 
technical civilization develops. 


e fL, the fraction of planetary lifetime graced by a technical civilization. 
Written out, the equation reads 


N=WNs-fo-ne>fl- fi: fo fi (7) 


All of the f’s are fractions, having values between 0 and 1; they will pare down the 
large value of As. 


To derive V we must estimate each of these quantities. We know a fair amount about 
the early factors in the equation, the number of stars and planetary systems. We know 
very little about the later factors, concerning the evolution of intelligence or the lifetime 
of technical societies. In these cases our estimates will be little better than guesses. I 
invite you, if you disagree with my estimates below, make your own choices and see 
what implications your alternative suggestions have for the number of advanced 
civilizations in the galaxy. One of the great virtues of this equation, due to Frank Drake 
of Cornell, is that it involves subjects ranging from stellar and planetary astronomy to 
organic chemistry, evolutionary biology, history, politics and abnormal psychology. 
Much of the Cosmos is in the span of the Drake equation. 


8 
UNCLASSIFIED/ 


UNCLASSIFIED / SPOR OPPreraAE Use Oner 


We know Ns, the number of stars in the Milky Way galaxy, fairly well, by careful counts 
of stars in a small but representative region of the sky. It is a few hundred billion; some 
recent estimates place it at 4 x 1011. Very few of these stars are of the massive short- 
lived variety that squander their reserves of thermonuclear fuel. The great majority 
have lifetimes of billions or more years in which they are shining stably, providing a 
suitable energy source for the energy and evolution of life on nearby planets. 


There is evidence that planets are a frequent accompaniment of star formation: in the 
satellite systems of Jupiter, Saturn and Uranus, which are like miniature solar systems; 
in theories of the origin of the planets; in studies of double stars; in observations of 
accretion disks around stars; and is some preliminary investigations of gravitational 
perturbations of nearby stars.’ Many, perhaps even most, stars may have planets. We 
take the fraction of stars that have planets, fp, as roughly equal to 1/3. Then the total 
number of planetary systems in the galaxy would be Ns fp ~ 1.3 x 1014 (the symbol ~ 
means “approximately equal to”). If each system were to have about ten planets, as 
ours does, the total number of worlds in the galaxy would be more than a trillion, a vast 
arena for the cosmic drama. 


In our own solar system there are several bodies that may be suitable for life of some 
sort: the Earth certainly, and perhaps Mars, Titan and Jupiter. Once life originates, it 
tends to be very adaptable and tenacious. There must be many different environments 
suitable for life in a given planetary system. But conservatively we choose ne=2. Then 
the number of planets in the galaxy suitable for life becomes Ns fp ne ~ 3 x 1077. 


Experiments show that under the most common cosmic conditions the molecular basis 
of life is readily made, the building blocks of molecules able to make copies of 
themselves. We are now on less certain grounds; there may, for example, be 
impediments in the evolution of the genetic code, although I think this is unlikely over 
billions of years of primeval chemistry. We choose ff ~ 1/3, implying a total number of 
planets in the Milky Way on which life has arisen at least once as Ns fp ne ff~ 1x 104, 
a hundred billion inhabited worlds. That in itself is a remarkable conclusion. But we are 
not yet finished. 


The choices of f and fc are more difficult. On the one hand, many individually unlikely 
steps had to occur in biological evolution and human history for our present intelligence 
and technology to develop. On the other hand, there must be quite different pathways 
to an advanced civilization of specified capabilities. Considering the apparent difficulty 
in the evolution of large organisms, represented by the Cambrian explosion, let us 
choose ff x fc = 1/106, meaning that only 1 per cent of planets on which life arises 
actually produce a technical civilization. This estimate represents some middle ground 
among the varying scientific options. Some think that the equivalent of the step from 
the emergence of trilobites to the domestication of fire goes like a shot in all planetary 
systems; others think that, even given ten or fifteen billion years, the evolution of a 
technical civilization is unlikely. This is not a subject on which we can do much 
experimentation as long as our investigations are limited to a single planet. Multiplying 


‘Carl Sagan was writings these lines back in the 1970's, when no extrasolar planets had been discovered yet. The 
first such discovery occurred in 1995, when Michel Mayor and Didier Queloz, working at the “Observatoire de Haute 
Provence” in France, discovered the first extrasolar planet orbiting the nearby star 51 Peg. This first extrasolar 
planet was hence named 51 Peg B. Many more extrasolar planets were discovered around nearby stars ever since. 
As of April 2009, 347 extrasolar planets (exoplanets) are listed in the Extrasolar Planets Encyclopaedia. 
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these factors together, we find Ns fp ne fl fi fo ~ 1 x 10°, a billion planets on which 
technical civilizations have arisen at least once. But that is very different from saying 
that there are a billion planets on which technical civilizations now exist. For this we 
must also estimate fL. 


What percentage of the lifetime of a planet is marked by a technical civilization? The 
Earth has harbored a technical civilization characterized by radio astronomy for only a 
few decades out of a lifetime of a few billion years. So far, then, for our planet ff is less 
than 1/108, a millionth of a percent. And it is hardly out of the question that we might 
destroy ourselves tomorrow. Suppose this were a typical case, and the destruction so 
complete that no other technical civilization - of the human or any other species — were 
able to emerge in the five or so billion years remaining before the Sun dies. Then Ns fp 
ne fl fi fc fL ~ 10, and, at a given time there would be only a tiny smattering, a handful, 
a pitiful few technical civilizations in the galaxy, the steady state number maintained as 
emerging societies replace those recently self-immolated. The number NV might be even 
as small as 1 if civilizations tend to destroy themselves soon after reaching a 
technological phase; there might be no one for us to talk with but ourselves. And that 
we do but poorly. Civilizations would take billions of years of tortuous evolution, and 
then snuff themselves out in an instant of unforgivable neglect. 


But consider the alternative, the prospect that at least some civilizations learn to live 
with technology; that the contradictions posed by the vagaries of past brain evolution 
are consciously resolved and do not Jead to self destruction; or that, even if major 
disturbances occur, they are reveres in the subsequent billions of years of biological 
evolution. Such societies might live to a prosperous old age, their lifetimes measured 
perhaps on geological or stellar evolutionary time scales. If 1 percent of civilizations can 
survive technological adolescence, take the proper fork at this critical historical branch 
point and achieve maturity, then fL ~ 1/100, N ~ 10’, and the number of extant 
civilizations in the galaxy is in the millions. Thus, for all our concern about the possible 
unreliability of our estimates of the early factors in the Drake equation, which involve 
astronomy, organic chemistry and evolutionary biology, the principal uncertainty comes 
to economics and politics and what, on Earth, we call human nature. It seems fairly 
Clear that if self-destruction is not the overwhelmingly preponderant fate of galactic 
civilizations, then the sky is softly humming with messages from the stars. 


These estimates are stirring. They suggest that the receipt of a message from space is, 
even before we decode it, a profoundly hopeful sign. It means that someone has 
learned to live with high technology; that it is possible to survive technological 
adolescence. This alone, quite apart from the contents of the message, provides a 
powerful justification for the search for other civilizations. 


4. The Drake Equation is Over-Simplified 


In the nearly fifty years (1961-2009) elapsed since Frank Drake proposed his equation, 
a number of scientists and writers tried to find out which numerical values of its seven 
independent variables are more realistic in agreement with our present-day knowledge. 
Thus there is a considerable amount of literature about the Drake equation nowadays, 
and, as one can easily imagine, the results obtained by the various authors largely 
differ from one another. In other words, the value of NV, that various authors obtained 
by different assumptions about the astronomy, the biology and the sociology implied by 
the Drake equation, may range from a few tens (in the pessimist’s view) to some 
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million or even billions in the optimist’s opinion. A lot of uncertainty is thus affecting our 
knowledge of NV as of 2010. In all cases, however, the final result about V has always 
been a sheer number, i.e., a positive integer number ranging from 1 to millions or 
billions. This is precisely the aspect of the Drake equation that this author regarded as 
“too simplistic” and improved mathematically in his paper #IAC-08-A4.1.4, entitled 
“The Statistical Drake Equation” and presented on October 1*, 2008, at the 59% 
International Astronautical Congress (IAC) held in Glasgow, Scotland, UK, September 
29% thru October 3, 2008. That paper is attached herewith as Appendix B. Newcomers 
to SETI and to the Drake equation, however, may find that paper too difficult to be 
understood mathematically at a first reading. Thus, I shall now explain the content of 
that paper “by speaking easily.” I thank the reader for his or her attention. 


5. The Statistical Drake Equation 


We start by an example. 


Consider the first independent variable in the Drake equation (7), i.e., As, the number 
of stars in the Milky Way galaxy. Astronomers tell us that approxirnately there should 
be about 350 millions stars in the galaxy. Of course, nobody has counted (or even seen 
in the photographic plates) a// the stars in the galaxy! There are too many practical 
difficulties preventing us from doing so: just to name one, the dust clouds that don’t 
allow us to see even the Galactic Bulge (i.e. the central region of the galaxy) in the 
visible light (although we may “see it” at radio frequencies like the famous neutral 
hydrogen line at 1420 MHz}. So, it doesn’t make any sense to say that Ns = 350 x 10S, 
or, say (even worse) that the number of stars in the galaxy is (say) 354,233,321, or 
similar fanciful exact integer numbers. That is just silly and non-scientific. Much more 
scientific, on the contrary, is to say that the number of stars in the galaxy is 350 million 
plus or minus, say, 50 millions (or whatever values the astronomers may regard as 
more appropriate, since this is just an example to let the reader understand the 
difficulty). 


Thus, it makes sense to REPLACE each of the seven independent variables in the Drake 
equation (7) by a MEAN VALUE (350 millions, in the above example) PLUS OR MINUS A 
CERTAIN STANDARD DEVIATION (50 millions, in the above example). 


By doing so, we have made a great step ahead: we have abandoned the too-simplistic 
equation (7) and replaced it by something more sophisticated and scientifically more 
serious: the STATISTICAL Drake equation. In other words, we have transformed the 
classical and simplistic Drake equation (7) into an advanced statistical tool for the 
investigation of a host of facts hardly known to us in detail. In other words still: 


* We replace each independent variable in (7) by a RANDOM VARIABLE, labeled 
D, (from Drake). 


» We assume that the MEAN VALUE of each D, is the same numerical value previously 
attributed to the corresponding independent variable in (7). 
¢ But now we also ADD A STANDARD DEVIATION o,, on each side of the mean value, 


that is provided by the knowledge gathered by scientists in each discipline 
encompassed by each D,. 


11 
UNCLASSIFIED / /®@ RO FEtGEHEUSEOnEY 


UNCLASSIFIED / 


Having so done, the next question is: 


How can we find out the PROBABILITY DISTRIBUTION for each p,? 


For instance, shall that be a Gaussian, or what? 


This is a difficult question, for nobody knows, for instance, the probability distribution of 
the number of stars in the galaxy, nat to mention the probability distribution of the 
other six variables in the Drake equation (7). 


There is a brilliant way to get around this difficulty, though. 


We start by excluding the Gaussian because each variable in the Drake equation is a 
POSITIVE (or, more precisely, a non-negative) random variable, while the Gaussian 
applies to REAL random variables only. So, the Gaussian is out. Then, one might 
consider the large class of well-studied and positive probability densities called “the 
gamma distributions,” but it is then unclear why one should adopt the gamma 
distributions and not any other. The solution to this apparent conundrum comes from 
Shannon’s Information Theory and a theorem that he proved in 1948: “The probability 
distribution having maximum entropy (= uncertainty) over any FINITE range of real 
values is the UNIFORM distribution over that range,” This is proven in Appendix A of the 
present document: 


So, at this point, we assume that each of the seven p, in (7) is a UNIFORM random 


variable, whose mean value and standard deviation is known by the scientists working 
in the respective field (let it be astronomy, or biology, or sociology). Notice that, for 
such a uniform distribution, the knowledge of the mean value yz» and of the standard 


deviation o, automatically determines the RANGE of that random variable in between 


its lower (called a; } and upper (called 4, ) limits: in fact these limits are given by the 
equations 


fa, = Hp, -V3 0p, 


8 
le, = Hp, +V30y, (8) 


(the “surprising” factor /3 in the above equations comes from the definitions of mean 


value and standard deviation: please see equations (12), (15) and (17) in Appendix B 
for the relevant proof). So the uniform distribution of each random variable p, is 


perfectly determined by its mean value and standard deviation, and so are all its other 
properties. 


The next problem is the following: 


OK, since we now know everything about each uniformly distributed p,, what is the 
probability distribution of N , given that WV is the product (7) of all the p, ? 


In other words, not only do we want to find the analytical expression of the probability 
density function of V, but we also want to relate its mean value 4, to all mean values 


Hp, Of the D,, and its standard deviation c,, to all standard deviations Gp, Of the D,. 
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This is a difficult problem. 
It occupied the author’s mind for no less than about ten years (1997-2007). 


It is actually an ANALYTICALLY UNSOLVABLE problem, in that, to the best of this 
author’s knowledge, it is IMPOSSIBLE to find an analytic expression for any FINITE 
PRODUCT of uniform random variables p, . This result is proven in Sections 2 thru 3.3 of 


Appendix B (unfortunately!). 


6. Solving the Statistical Drake Equation By Virtue of the 
Central Limit Theorem (CLT) of Statistics 


The solution to the problem of finding the analytical expression for the probability 
density function of WV in the statistical Drake equation was found by this author in 
September 2007. The key steps are the following: 


e Take the natural logs of both sides of the statistical Drake equation (7). This 
changes the product into a sum. 


e The mean values and standard deviations of the logs of the random variables D, 


may all be expressed analytically in terms of the mean values and standard 
deviations of the D,. 


e Recall the Central Limit Theorem (CLT) of statistics, stating that (loosely speaking) if 
you have a SUM of independent random variables, each of which is ARBITRARILY 
DISTRIBUTED (hence, also including uniformly distributed), then, when the number 
of terms in the sum increases indefinitely (i.e. for a sum of random variables 
infinitely long)... the SUM RANDOM VARIABLE TENDS TO A GAUSSIAN. 


» Thus, the natural log of WV tends to a Gaussian. 
e Thus, NV tends to the LOGNORMAL DISTRIBUTION. 


e The mean value and standard deviations of this lognormal distribution of A may all 
be expressed analytically in terms of the mean values and standard deviations of 
the logs of the D, already found previously. 


This result is fundamental. 


All the relevant equations are summarized in the following Table 1. This table is actually 
the same as Table 2 of the author’s original paper IAC-08-A4.1.4, entitled “The 
Statistical Drake Equation” and presented by him at the International Astronautical 
Congress (IAC) held in Glasgow, UK, on October 1%, 2008. This original paper is 
reproduced in Appendix B. 


To sum up, not only is it found that V approaches the completely known lognormal 
distribution for an INFINITY of factors in the statistical Drake equation (7), but the way 
is paved to further applications by removing the condition that the number of terms in 
the product (7) must be FINITE. 
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This possibility of ADDING ANY NUMBER OF FACTORS IN THE DRAKE EQUATION (7) 
was not envisaged, of course, by Frank Drake back in 1961, when “summarizing” the 
evolution of life in the galaxy in SEVEN simple STEPS. But today, the number of factors 
in the Drake equation should already be increased: for instance, there is no mention in 
the original Drake equation of the possibility that asteroidal impacts might destroy the 
life on Earth at any time, and this is because the demise of the dinosaurs at the K/T 
impact had not been yet understood by scientists in 1961, and was so only in 1980! 


In practice, the number of factors should INCREASE as much as necessary in order to 
get better and better estimates of WV as long as our scientific knowledge increases. This 
is called the “Data Enrichment Principle” and believe should be the next important goal 
in the study of the statistical Drake equation. 


Finally, a numerical example explaining how the statistical Drake equation works in the 
practice will be given in the next section. 
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Table 1. Summary of the Properties of the Lognormal Distribution That Applies 
to the Random Variable N = Number of ET Communicating Civilizations in the 


Galaxy 
Random variable N = number of communicating ET 
civilizations in galax 
Probability distribution 


(infu) a 


ge 


(7 = 0} 


Probability density function 


Mean value 


Standard deviation 
All the moments, i.e. k-th moment 


Mode (= abscissa of the lognormal peak) 


Value of the Mode Peak 
N 
¢ 


Skewness K. 
a 
Kurtosis Ky gear 255 git 43 et = 


Expression of sin terms of the lower (a) 
and upper (b;) limits of the Drake 
uniform input random variables D; 
Expression of o’in terms of the lower (a) 
and upper (0) limits of the Drake 
uniform input random variables D; 


Hye 


Ride = Mpeak = © 


1a) lela) 


I a,b,[In (6, }~In(a, r 
i-t i-l (bo, -4,) 


7. An Example Explaining the Statistical Drake Equation 


To understand how things work in practice for the statistical Drake equation, please 
consider the following table 2. It is made up of three columns: 


e The first column on the left lists the seven input sheer numbers that also become 
» The mean values (middle column). 


® Finally the last column on the right lists the seven input standard deviations. 
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The bottorn line is the classical Drake equation (7). We see that, for this particular set 
of seven inputs, the classical Drake equation (i.e. the product of the seven numbers} 
yields a total of 3500 communicating extraterrestrial civilizations existing in the galaxy 
right now. 


N i= Ns fp-ne-f-f-fe 


Table 2. Input Values (i.e. mean values and standard deviations) for the Seven Drake Uniform Random 
Variables Di. The first colurnn on the left lists the seven input sheer numbers that also become the mean values 
(middle column). Finally the last column on the right lists the seven input standard deviations. The bottom line is 
the classical Drake equation (7). 


The statistical Drake equation, however, provides a much more articulated answer than 
just the above sheer number WV = 3500. In fact, a MathCad code written by this author 
and capable of performing all the numerical calculations required by the statistical 
Drake equation for a given set of seven input mean values plus seven input standard 
deviations, yields for V the lognormal distribution (thin curve) plotted in Figure 2. We 
see immediately that the peak of this thin curve (i.e. the mode) falls at about 

fumde = peu Oo e* = 250 (this is equation (99) of Appendix B), while the median (fifty- 
fifty value splitting the lognormal density in two parts with equal undergoing areas) falls 
At ADOUE Fading =@% = 1740 . These seem to be smaller values than WV = 3500 provided by 
the classical Drake equations, but it’s a wrong impression due to a poor “intuitive” 
understanding of what statistics is! In fact, neither the mode nor the median are the 
“really important” values: the really important value for V is the MEAN VALUE! Now if 
you look at the thin curve in Figure 2 below (i.e. the lognormal distribution arising from 
the Central Limit Theorem), you see that this curve has a LONG TAIL ON THE RIGHT! In 
other words, it does NOT immediately go down to nearly zero beyond the peak of the 
mode. Thus, when you actually compute the mean value, you should not be too 
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surprised to find out that it equals (N) =e" 2 « 4589.559 ~ 4590 communicating 
civilizations now in the galaxy. This is the important number, and it is HIGHER than the 
3500 provided by the classical Drake equation. Thus, in conclusion, THE STATISTICAL 
EXTENSION of the classical Drake equation INCREASES OUR HOPES to find an 
extraterrestrial civilization! 


sf PROBABILITY DENSITY FUNCTION OF N 


Prob, density function of N 


N = Number of ET Civilizations m Galaxy 


Figure 2. Comparing the Two Probability Density Functions of the Random Variable N Found (1) 
Without Resorting to the CLT at All (thick curve) and (2) Using the CLT and the Relevant Lognormal 
Approximation (thin curve). 


Even more so our hopes are increased when we go on to consider the standard 
deviation associated with the mean value 4590. In fact, the standard deviation is given 


by equation (97) of Appendix B. This yields o, =e” ¢ ? ve" -1=11195 and so the 
expected number of V may actually be even much higher than the 4590 provided by 
the mean value alone! The “upper limit of the one-sigma confidence interval” (as 
statisticians call it), i.e. the sum 4590+11195 = 15,785, yields a higher number still! 
(Note: the “lower limit of the one-sigma confidence interval is ZERO because the 
lognormal distribution is POSITIVE (or, more correctly, non-negative)). Finally, the 
reader should note that the thick curve depicted in Figure 2 is just the NUMERICAL 
solution of the statistical Drake equation for a FINITE number of 7 input factors. Figure 
2 actually shows that this curve “is well interpolated” by the lognormal distribution (thin 
curve), i.e., by the neat analytical expression provided by the Central Limit Theorem for 
an INFINITE number of factors in the Drake equation. That is, in conclusion, Figure 2 
visually shows that taking 7 factors or an infinity of factors “is almost the same thing” 
already for a value as small as 7. 
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8. Finding the Probability Distribution of the Et-Distance By 
Virtue of the Statistical Drake Equation 


Having solved the statistical Drake equation by finding the lognormal distribution, we 
are now in a position to solve the ET-DISTANCE problem by resorting to statistics again, 
rather than just to the purely deterministic Distance Law (5), as we did in Section 2. 
This is “scientifically more serious” than just the purely deterministic Distance Law (5) 
inasmuch as the new statistical Distance Law will yield a PROBABILITY DENSITY for the 
Distance, with the relevant mean value and standard deviation. In other words, the 
Distance Law (5) itself becomes a random variable whose probability distribution, mean 
value and standard deviation must be computed by “replacing” into (5) the fact that 
is now known to follow the lognormal distribution. This is mathematically described in 
detail in Section 7 of Appendix A. 


The important new result is the PROBABILITY DENSITY FOR THE DISTANCE, the 
equation of which is 


. , | - } 
Fet_pDistane (1) aa € . (9) 
F V2ro 


holding for r20. This is equation (114) of Appendix B. 


Starting from this equation, the MEAN VALUE OF THE random variable ET_DISTANCE is 
computed as 


(ET_Distance)=Ce 3 ¢!8 (10) 


which is equation (119) of Appendix B, and finally the ET_DISTANCE STANDARD 


DEVIATION 
Hoe a 
Fier pisume = Ce st ee a | (11) 


which is equation (123) of Appendix B. Of course, all other descriptive statistical 
quantities, such as moments, cumulants etc. can be computed upon starting from the 
probability density (9), and the result is Table two hereafter, that is Table 3 of Appendix 
B. 


Finally, to complete this section, as well as this “introduction to the statistical Drake © 
equation,” the numerical values that equations (10) and (11) yield for the Input Table i 
are determined. They are, respectively: 


3 
uog 


Fean value = Ce e Ba 2,67) light years (12) 


fi 
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which is equation (153) of Appendix B, and 


noe o 
Orr Distme = CC 3 yl Ve 9 —] = 1,308 light years (13) 


which is equation (154) of Appendix B. 
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Table 2. Summary of the Properties of the Probability Distribution That Applies 
to the Random Variable ET_Distance Yielding the (average) Distance Between 
Any Two Neighboring Communicating Civilizations in the Galaxy 


Random variable 


ET_Distance between any two neighboring 
ET civilizations in galaxy assuming they are 
UNIFORMLY distributed throughout the 
whole galaxy volume. 


Probability distribution 


Probability density function 


Numerical constant C related to the Milky 
Way size 


Standard deviation 
All the moments, i.e. k-th moment 


Mode (= abscissa of the lognormal peak) 


- 
A 


. _ O&A ; 
(ET_Distance* ) =Ce 3¢ 18 


7 = a3, ¥ 
Tiode = Poeuk =Ce e 


Peak Value of Fer nistane (4) = 


Value of the Mode Peak 


Median (= fifty-fifty probability value for N) 


Skewness 


H o 


: 3 
=f ET Distane: ane = age > 
C270 


it 


inedian =m=Ce 3 


Expression of “in terms of the lower (ai) 
and upper (bi) limits of the Drake uniform 
input random variables Di 


Expression of "in terms of the lower (ai) 
and upper (bi) limits of the Drake uniform 
input random variables Di 


2 oI _ ab, [in (B, )~ in (a; YT 


OF. — 
we? (6,-a,) 


Q 
ir] 
II 
M)-~ 
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It is clarifying to draw the graph of the ET_Distance probability density (9): 


DISTANCE OF NEAREST ET_CIVILIZATION 


Probability density function (1 Aneters) 


1500 «2000 =62500 =63000 »§=6 3500) »=— 4000) = 4500 
ET_Distance from Earth (light y cars) 


5000 


Figure 3. The Probability of Finding the Nearest Extraterrestrial Civilization at the distance r From Earth 
(in light years) if the Values Assumed in the Drake Equation are Those Shown in Input Table 1. The 
relevant probability density function fer pisune(?) is given by equation (9), Its mode (peak abscissa) equals 1933 


light years, but its mean value is higher since the curve has a long tail on the right: the mean value equals in fact 
2670 light years. Finally, the standard deviation equals 1309 light years: THIS IS GOOD NEWS FOR SETI, 
inasmuch as the nearest ET galaxy civilization might fie at just 1 sigma = 2670-1309 = 1361 light years from us. 


From Figure 3 we see that the probability of finding extraterrestrials is practically zero 
up to a distance of about 500 light years from Earth. Then it starts increasing with the 
increasing distance from Earth, and reaches its maximum at 


2 
#06 


=EPyae =Ce Fe % =1,933 light years. (14) 


Funde pea 


This is the MOST LIKELY VALUE of the distance at which we can expect to find the 
nearest extraterrestrial civilization, 


It is not the mean value of the probability distribution (9) for fez picane(t)+ In fact, the 


probability density (9) has an infinite tail on the right, as clearly shown in Figure 3, and 
hence its mean value must be higher than its peak value. As given by (10) and (12), its 


ee re Ce el = 2670 light years. This is the MEAN (value of the) 
DISTANCE at which we can expect to find extraterrestrials. 


oo 
mean value is + 
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After having found the above two distances (1933 and 2670 light years, respectively), 
the next natural question that arises is: “what is the range, back and forth around the 
mean value of the distance, within which we can expect to find extraterrestrials with 
“the highest hopes?” The answer to this question is given by the notion of standard 
deviation that we already found to be given by (11) and (13), 


A ‘3 
i OF T 


Fier pistime =C@ * Ve? —1 = 1309 light years. 

More precisely, this is the so-called 1-sigma (distance) level. Probability theory then 
shows that the nearest extraterrestrial civilization is expected to be located within this 
range, i.e. within the two distances of (2670-1309) = 1361 light years and 
(2670+1309) = 3979 light years, with probability given by the integral Of fur pigune (7) 


taken in between these two lower and upper limits, that is: 


S97 9igbiycars 
I ny Distina (r) dr il 0. WS = TDG (1 5) 
1 ss 


36 lightyears 


In plain words: with 75 percent probability, the nearest extraterrestrial civilization is 
located in between the distances of 1361 and 3979 light years from us, having assumed 
the input values to the Drake Equation given by table 1. If we change those input 
values, then all the numbers change again, of course. 


9. The “Data Enrichment Principle” as the Best CLT 
Consequence Upon the Statistical Drake Equation (Any 
Number of Factors Allowed) 


As a fitting climax to all the statistical equations developed so far, let us now state our 
“DATA ENRICHMENT PRINCIPLE.” It simply states that “The Higher the Number of 
Factors in the Statistical Drake equation, The Better.” 


Put in this simple way, it simply looks like a new way of saying that the CLT lets the 
random variable Y approach the normal distribution when the number of terms in the 
sum (4) approaches infinity. And this is the case, indeed. 


10. Conclusions 


We have sought to extend the classical Drake equation to let it encompass Statistics 
and Probability. 


This approach appears to pave the way to future, more profound investigations 
intended not only to associate “error bars” to each factor in the Drake equation, but 
especially to increase the number of factors themselves. In fact, this seems to be the 
only way to incorporate into the Drake equation more and more new scientific 
information as soon as it becomes available. In the long run, the Statistical Drake 
equation might just become a huge computer code, growing in size and especially in 
the depth of the scientific information it contains. It would thus be Humanity’s first 
“Encyclopaedia Galactica.” 
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Unfortunately, to extend the Drake equation to Statistics, it was necessary to use a 
mathematical apparatus that is more sophisticated than just the simple product of 
seven numbers. 
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Appendix A: Proof of Shannon’s 1948 Theorem Stating 
That the Uniform Distribution is the “Most Uncertain” One 
Over a Finite Range of Values 


Information Theory was initiated by Claude Shannon (1916-2001) in his well-known 
1948 two papers: 


Reprated with coreiton: Jom Te Bes! Sutren: Techated’ sburtiai 
Sol TM gp 37423 62s! aly. Orcober. 1943 


a 
ae 


A Mathematical Theory of Communication 
By C. =. SHANNON 


In this Appendix, we wish to draw attention to a couple of theorems that Shannon 
proves on pages 36 and 37 of his work, and read, respectively (note that Shannon 
omits the upper and lower limits of all integrals in the first theorem: they are minus 
infinity and plus infinity, respectively): 


S. Let py! be a one-dimensional dsstinbution. The formef poy) giviag a maximum entropy subsect to dhe 
condition that the standard devianen cf x be fixed ata i: Gaussian. To show this we aust maximize 


Aix { penulagp xidy 


with 


oS | pouixay and it - j poxiax 


as compass. This requires. by the calculus of variations. maxiouzing 


4 


/ | givilogeix.  ApLyiT + ppixl ax. 
The condinen tor this 15 
5 = 
L legmare A -y—U 
and consequently fadpusting the constants to satisfy the constraints} 
l age 
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yo 
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*. [fxs hated to a bal? ine (pixar — 6 for +! Ob and the first moment of x is fixed ata 
a i Pio dy. 
#3 
then the mmasumum enmopy sccurs when 
Bivin se 
and 14 equal to log az. 


Now, we wish to point out that there is a third possible case, other than the two given 
by Shannon. This is the case when the probability density function p(x) is limited to a 
FINITE INTERVAL a<.x<b. This is obviously the case with any physical POSITIVE 
random variable, such as a distance, or the number AW of extraterrestrial communicating 
civilizations in the ,”. And it is easy to prove that for any such finite random variable the 
maximum entropy distribution is the UNIFORM distribution over a<x<>. Shannon did 
not bother to prove this simple theorem in his 1948 papers since he probably regarded 


it as too trivial. But we prefer to point out this theorem since, in the language of the 
statistical Drake equation, it sounds like: 


“Since we don’t know what the probability distribution of any one of the Drake random 
variables 2, is, it is safer to assume that each of them has the maximum possible 


entropy overa,<x<b, i.e., that D, is UNIFORMLY distributed there. 


The proof of this theorem is along the sare lines as for the previous two cases 
discussed by Shannon: 


We start by assuming that «; <x<o,. 


We then form the linear combination of the entropy integral plus the normalization 
condition for D, 


af” [- p(x) log p (x) +2 pi} dx =0 
where 2 is a Lagrange multiplier. 


Performing the variation, one finds 


—log p&x)-14+.4 =0 that is: p(x)=e7 }. 


Applying the normalization condition (constraint) to the last expression for p(x} yields 


fh, P, by, 
| -| plx) de -{ e | de= ef dx =e? "(b, -a,) 


ih; Cy 


that yields 
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and finally 


p(x)=——— with a,<x<b, 


showing that the maximum-entropy probability distribution over any FINITE interval 
a, <x<b, is the UNIFORM distribution. 
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Appendix B: Original Text of the Author’s Paper #IAC-08- 
A4.1.4 Titled the Statistical Drake Equation 


IAC-08-A4.1.4 


THE STATISTICAL DRAKE EQUATION 


Claudio Maccone 
Co-Vice Chair, SET! Permanent Stidy Group, international Academy of Astronautics 


Address: Via Martorelli, 43 - Torino (Turin) 10155 - Itaty 
URL: http://www.maccone.com/ - E-mail: clmaccon@libero.it 


ABSTRACT. We provide the statistical generalization of the Drake equation. 


From a simple product of seven positive numbers, the Drake equation is now turned into the product of seven 

positive random variables. We call this “the Statistical Drake Equation,” The mathematical consequences of 

this transformation are then derived. The proof of our results is based on the Central Limit Theorem (CLT) of 

Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of 

which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable. This is called 

the Lyapunov Form of the CLT, or the Lindeberg Form of the CLT, depending on the mathematical constraints 
assumed on the third moments of the various probability distributions. In conclusion, we show that: 

1) The new random variable N, yielding the number of communicating civilizations in the Galaxy. follows the 
LOGNORMAL distribution. Then, as a consequence, the mean value of this lognormal distribution is the 
ordinary NV in the Drake equation. The standard deviation, mode, and all the moments of this lognormal V 
are found also. 

2) The seven factors in the ordinary Drake equation now become seven positive random variables. The 
probability distribution of each random variable may be ARBITRARY. The CLT in the so-called 
Lyapunoy or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for 
that. In other words, the CLT “translates” into our statistical Drake equation by allowing an arbitrary 
probability distribution for each factor. This is both physically realistic and practically very useful, of 
course. 

3) An application of our statistical Drake equation then follows. The (average) DISTANCE between any two 
neighboring and communicating civilizations in the Galaxy may be shown to be inversely proportional to 
the cubic root of N. Then, in our approach, this distance becomes a new random variable. We derive the 
relevant probability density function, apparently previously unknown and dubbed “Maccone distribution” 
by Paul Davies. 

4) DATA ENRICHMENT PRINCIPLE. It should be noticed that ANY positive number of random variables 
in the Stalistical Drake Equation is compatible with the CLT. So, our generalization allows for many more 
facturs to be added in the future as long as more refined scientific knowledge about each factor will be 
known to the scientists. This capability to make room for more future factors in the statistical Drake 
equation we call the “Data Enrichment Principle”, and we regard it as the key to more profound future 
results in the fields of Astrobiology and SETI. 


Finally, a practical example is given of how our statistical Drake equation works numerically. We work out in 
detail the case where each of the seven random variables is uniformly distributed around its own mean value 
and has a given standard deviation. For inslance, the number of slars in the Galaxy is assumed to be uniformly 
distributed around (say) 350 billions with a standard deviation of (say) | billion. Then, the resulting lognormal 
distribution of N is computed numerically by virtue of a MathCad file that the author has written. This shows 
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that the mean value of the lognormal random variable N is actually of the same order as the classical N given 
by the ordinary Drake equation, as one might expect from a good statistical generalization. 


1, INTRODUCTION 


The Drake equation is a now famous result 
(see rel. [1] for the Wikipedia summary) m the 
fields of SETI (the Search for ExtraTerrestial 
Intelligence, see ref. [2]) and Astrobiology (see ref. 
[3]). Devised in 1960, the Drake equation was the 
first scientific attempt to estimate the number N of 
ExtraTerrestrial civilizations in the Galaxy with 
which we might come in contact. Frank D, Drake 
(see ref. [4]} proposed it as the product of seven 
factors: 


N = Ns- fp-ne- fl - ft» fe- fh. (1) 
Where: 
1} Ns is the estimated number of stars in our 
Galaxy. 


2) fp is the fraction (= percentage) of such slars 
that have planets. 

3) ne is the number “Earth-type” such planets 
around the given star; in other words, re is 
number of planets, in a given stellar system, 
on which the chemical conditions exist for life 
to begin its course: they are “ready for life,” 

4) flis fraction (— percentage) of such “ready for 
life’ planets on which life actually starts and 
grows up (but not yet to the “intelligence” 
level). 

5) fi is the fraction (= percentage) of such 
“planets with life forms” that actually evolve 
until some form of “intelligent civilization” 
emerges (like the first, historic human 
civilizations on Earth). 

6) fe is the fraction (= percentage) of such 
“planets with civilizations” where the 
civilizations evolve to the point of being able 
fo communicate across the  imierstellar 
distances with other {at least) similarly 
evolved civilizations. As far as we know in 
2008, this means that they must be aware of 
the Maxwell equations governing radio waves, 
as well as of computers and radioastronomy 
{at least). 

7) fL is the fraction of galactic civilizations alive 
at the time when we, poor humans, attempt to 
pick up their radio signals (that they throw out 
inte space just as we have done since 1900, 
when Marconi started the transatlantic 
transmissions), In other words, fE is the 
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number of civilizations now transmitting and 
recciving, and this implies an estimate of “how 
long will a technological civilization live?” 
that nobody can make at the moment. Also, 
are they going to destroy themselves in a 
nuclear war, and thus live only a few decades 
of technological civilization? Or are they 
slowly becoming wiscr, reject war, speak a 
single language (lke English today), and 
merge intu a single “nation”, thus living in 
peace for ages? Or will robots take over one 
day making ‘flesh animals” disappear forever 
(the so-called “post-biological universe”)? 
No one knows... 


But let us go back to the Drake equation (1). 

In the fifty years of its existence, a number of 
suggestions have been put forward about the 
different numeric values of its seven factors. Of 
course, every different set of these seven input 
numbers yields a different value for VN, and we can 
endlessly play that way. Bul we claim that these 
are like... children plays! 


We claim the classical Drake equation (1), as 
we shall call it from now on to distinguish it from 
our statistical Drake equation to be introduced in 
the coming sections, well, the classical Drake 
equation is scientifically inadequate in one regard 
at least: it just handles sheer numbers and does not 
associate an error bar to each of its seven factors. 
At the very least, we want to associate an error 
bar to each D;. 


Well. we have thus reached STEP ONE in our 
improvement of the classical Drake equation: 
replace each sheer number by a_ prabability 
distribution! 


The reader is now asked to look at the flow 
chart in the next page as a guide lo this paper, 
please. 


2. STEP 1: LETTING EACH FACTOR 
BECOME A RANDOM VARIABLE 


In this paper we adopt the notations of the 
great book “Probability, Random Variables and 
Stochastic Processes” by Athanasios Papoulis 
(1921-2002), now re-published as Papoulis-Pillai, 
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ref. [5]. The advantage of this notation is that it 
makes a neat distinction between probabilistic (or 
statistical: it's the same thing here) variables, 
always denoted by capitals, from non-probabilistic 
(or “deterministic”) variables, always denoted by 
lower-case letters. Adopting the Papoulis notation 
also is a tribute to him by this authur, who was a 
Fulbright Grantee in the United States with him at 
the Polytechnic Institute (now Polytechnic 
University) of New York in the years 1977-78-79. 


We thus introduce seven new (positive) 
random variables D, (“D™ from “Drake”) defined 


as 


D, = Ns 
D,= fp 
D,= He 
D,= fl (2) 
Ds, = fi 
Dy = fe 
D,= fi 


so that our STATISTICAL Drake equation may be 
simply rewrillen as 
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N=[[9,. (3) 


Of course. N now becomes a (positive) random 
variable too, having its own (positive) mean value 
and standard deviation, Just as each of the BD, has its 
own (posilive) mean value and standard deviation... 
... the natural question then arises: how are the seven 
mean values on the right related 10 the mean value on 
the left? 
... and how are the seven standard deviations on the 
right related to the standard deviation on the left? 

Just take the next step... 


3. STEP 2: INTRODUCING LOGS TO 
CHANGE THE PRODUCT INTO A SUM 


Products of randym variables are nol casy to 
handle in probability theory. It is actually much 
easier lo handle sums of random variables, rather 
than products, because: 

1) The probability density of the sum of two or 
more independent random variables is the 
convolution of the relevant probability 
densities (worry not about the equations, 
right now). 

2). The Fourier transform of the convolution 
simply is the product of the Fourier 
transforms (again, worry not about the 
equations, at this point) 
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1, Introduction 


2. Step 1: Letting each factor become a random 
2.1. Step 2: Introducing logs to change the product into a 
2.2. Step 3; The transformation law of random variables. 


3. Step 4: Assuming the easiest input distribution for 
each D;: the uniform distribution. 


3.1. Step 5: A numerical example of the Statistical Drake equation 
with uniform distributions for the Drake random 
variables 0;. 


3.2. Step 6: Computing the logs of the 
7 uniformly distributed 
Drake random variables 
B:, 


3.3. Step 7: Finding the probability 
density function of 4, but 
only numericalty not 
analytically. 


DEAD END! 4, The Central Limit Theorem (CLT) of Statistics, 


. LOGNORMAL distribution as the probability 
distribution of the number N of 
communicating ExtraTerrestrial Civilizations 


in the Galaxy. 


Comparing the CLT results with the Non-CLT 
results, and discarding the Non-CLT approach, 


. DISTANCE to the nearest ExtraTerrestrial 
Civilization as a probability distribution (Paul 
Davies dubbed that the Maccone distribution). 


7.1L Classical, non-probabilistic derivation of the 
Distance to the nearest ET Civilization. 


7.2 Probabilistic derivation of probability density 
function for nearest ET Civilization Distance. 


7.3 Statistical properties of the distribution. 
7.4 Numerical example of the distribution. 


8. DATA ENRICHMENT PRINCIPLE as the best 
CLT consequence upon the Drake equation: 
any number of factors allowed far. 
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So, let us take the natural logs of both sides of the 
Statistical Drake equation (3) and change it into a 
sum: 


in(w) = {TT »,- Tal). (4) 


It is now convenient to intraduce eight new (positive) 
random variables defined as follows: 


{ Y =In(W) 
ly, =In{D,) #=1,....7. 


Upon inversion, the first equation of (5) yields the 
important equation, that will be used in the sequel 


N=e’. (6) 


We are now ready to take STEP THREE. 


STEP 3: THE TRANSFORMATION LAW 
OF RANDOM VARIABLES 


So far we did not mention at all the problem: 
“which probability distribucion shall we attach to 
each of the seven (positive) random variables D.?” 


It is not casy to answer this question because we 
do not have the least scientific clue to what 
probability distributions fil at best to cach of the 
seven points listed in Section |. 


Yet, at least one trivial error must be avoided: 
claiming that each of those seven random yariables 
must have a Gaussian (i.e. normal) distribution. In 
fact, the Gaussian distribution, having the well- 
known bell-shaped probability density function 


felsmal=zi—-e ?* (720) 


has ils independent variable y ranging between —o 
and o2 and so it can apply to a real random variable 
¥ only, and never to positive random variables like 
those in the staustical Drake equation (3). Period. 


Searching again for probability density functions 
that represent positive random variables, an obvious 
choice would be the gamma distributions (see, for 
instance, ref. [6]). However, we discarded this choice 
too because of a different reason: please keep in mind 
that, according to (5), once we selected a particular 
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type of probability density function (pdf) for the last 
seven of equations (5), then we must compute the 
(new and different) pdf of the logs of such random 
variables. And the pdf of these logs certainly is not 
gamima-type any more. 


It is high time now to remind the reader of a 
certain theorem that is proved in probability courses, 
but, unfortunately, does not seem to have a specific 
name. [t is the fransformation law (so we shall call 
it, see, for instance, ref. [5J) allowing us to compute 
the pdf of a certain new random variable Y that is a 
known function Y = e(X ) of another random 
variable X having a known pdf. In other words, if the 
pdf f, (x) of a certain random variable X is known, 


then the pdf f,()') of the new random variable Y, 
related to X by the functional relationship 


¥ = 2({x) (8) 


can be calculated according to this rule: 

1} First invert the corresponding non-probabilistic 
cqguation v= glx) and denote by xy} the 
various real roots resulting from the this 
inversion, 

2) Second, take notice whether these real roots may 
be either finitely- or infinitely-many, according 
to the nature of the function y = g(x). 

3) Third, the probability density function of Y is 
then given by the (finite or infinite) sum 


; fel Cy) 
fy\yJ=) (9) 
eer) 


; 


where the summation extends to all roots x;(y) and 


# (x{v)} is the absolute value of the first 


derivative of g(x) where the i-th root x; (y) has 
been replaced instead of x. 


Since we must use this transformation law to transfer 
from the D, to the ¥; =In{D;), it is clear that we 


need to start from a ); pdf that is as simple as 


possible. The gamma pdf is not responding to this 
need because the analytic expression of the 
transformed pdf is very complicated (or, at least, it 
looked so to this author in the first instance). Also, 
the gamma distribution has two free parameters in it, 
and this “complicates” its application to the various 
meanings of the Drake cquation. In conclusion, we 
discarded the gamma distributions and confined 
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ourselves to the simpler uniform distribution instead, 
as shown in the nest section. 


4. STEP 4: ASSUMING THE EASIEST 
INPUT DISTRIBUTION FOR EACH BD; : 
THE UNIFORM DISTRIBUTION 


Let us now suppose that each of the seven Dj; is 
distributed UNIFORMLY in the interval ranging 
from the lower limit a; >0 to the upper limit 
b, 2 ;. 

This is the same as saying that the probability 
density function of each of the seven Drake random 
variables D, has the equation 


Junior. {x) = withO<sa,sxsbh, (10) 


t — €; 


as it follows at once from the normalization condition 
d. ; 
[ Funiiormb, (x) dx=1. (11) 


Let us now consider the mean value of such 
uniform 0; defined by 


D>, : ] Hy 
(uniform_D;) = i x uniter, (x) dx = { x dx 
tt b;, — a; a, 


Cd 


By words (as it is intuitively obvious): the mean 
value of the uniform distribution simply is the mean 
of the Jower plus upper limit of the variable range 


a, +8; 
2 


(uniform_D, } = 


In order to find the variance of the uniform 
distribution, we first need finding the second moment 


b, 
ae 2 "ae pe ; 
(uniform_D, ) a [ de tab uniform D; («) dx 


b, : 
1 [xt ft bf a; 

vo dx= Se ay 

&, b-a;[ 3] 3 (b, -a;} 


a3 


_ (i, ~a,)(a? +a;b; +62) my a; +a,b, +b? 


3 (b, —a,) 3 


The second moment of the uniform distribution is 
thus 


a? +a,b, +b? 
(uniform_D,”) = ent (13) 


From (12 and (13) we may now derive the variance 
of the uniform distribution 


? oye > ‘ea 3 
Funitom_D, = (uniform_D;” ) ‘a (uniform_D, } 
2 2 2 2 
_ 4; +4; +b; (a+b) _ Gj -a) .. wads 


Upon taking the square root of both sides of (14), we 
finally obtain the standard deviation of the uniform 
distribution: 


Sane, — 1 . (15) 
ont] 


We now wish to perform a calculation that is 
mathematically trivial, but rather uncxpected from 
the intuitive point of view, and very important for our 
applications to the statistical Drake cquation. Just 
consider the lwo simullancous cquations (12) and 
(15) 

we a, +b; 
(uniform_D, ) = a 

step (16) 

_ a; 


Oo a = 
uniforn; 
7 af% 


Upon inverting this trivia] linear system, one finds 


(a ; = (uniform_D; )- V30 uniform D, 


17 
[2 = (uniform_D; ) ok V3 Finite D, = 


This is of paramount importance for our application 
the Statistical Drake equation inasmuch as it shows 
that: 

if one (scientifically) assigns the mean value and 
standard deviation of a certain Drake random 
variable Dj, then the lower and upper limits of the 
relevant uniforin distribution are given by the two 
equations (17), respectively. 
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In other words, there is a factor of V3 =] 732 
included in the two equations (17) that is not obvious 
at all to human intuition, and must indeed be taken 
into account. 


The application of this result to the Statistical Drake 
cquation is discussed in the next section. 


3.1 STEP 5; A NUMERICAL EXAMPLE 
OF THE STATISTICAL DRAKE 
EQUATION WITH UNIFORM 
DISTRIBUTIONS FOR THE DRAKE 
RANDOM VARIABLES D; 


The first variable Ms in the classical Drake 
equation (1) 3s the number of stars in our Galaxy. 
Nobody knows how many they are exactly (1). Only 
statistical estimates can be made by astronomers, and 
they oscillate (say) around a mean value of 350 
billions Gif this value is indeed correct!). This being 
the situation, we assume that our uniformly 
distributed random variable Ns has a mean value of 
350 billions minus or plus a standard deviation of 
(say) one billion (we don’t care whether this number 
is Scientifically the best estimate as of August 2008: 
we just want to set up a numerical example of our 
Statistical Drake equation). In other words, we now 
assume that one has: 


(uniform_D,) = 350-10” 


oe 


uniform 1, 


Therefore, according to equations (17) the lower and 
upper limit of our uniform distribution for the 
random variable Ns=D, arc, respectively 


ay, = (uniform D,)— V3 Sonitomm,p, = 348-3-10° a 


= : 2 = 7 9 
Bx = (uniform_D,) + V3 Oyyinsm, = 351.7-10 


Similarly we proceed for all the other six random 
variables in the Statistical Drake cquation (3). 


For instance, we assume that the fraction of stars 
that have plancts is 50%, ic. 50/100, and this will be 
the mean valuc of the random variable fo=Dx1. We 
also assume that the relevant standard deviation will 
be 10%, i. ¢. that @,, =10/100 . Therefore, the 
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relevant lower and upper limits for the uniform 
distribution of fo=D2 turn oul to be 


fie (uniform_D» ) a yo Ouandes: = 0.327 


3 (20) 
b ip = (uniform_D, ) ifs V3 Cuniorm Db, = 0.673 


The next Drake random variable is the number 
ne of “Earth-type” planets in a given star system. 
Taking example from the Solar System, since only 
the Earth is truly “Earth-type”, the mean value of ne 
is clearly 1, but the standard deviation is nat zero if 
we assume that Mars also may be regarded as Earth- 
type. Since there are thus two Earth-type planets in 
the Solar System, we must assume a_ standard 


deviation of 1/V¥3 =0.577 to compensate the V3 
appearing in (17) in order to finally yicld two “Earth- 
type” planets (Earth and Mars) for the upper limit of 
the random variable ne. In other words, we assume 
that 


ay = (uniform_D 3) a V3 Funitorm.D, = 0 


(21) 
ae =. (u niform_D 2) = V3 Finitirm D, =2 


The next four Drake random variables have even 
more “arbitrarily” assumed values that we simply 
assume for the sake of making up a numerical 
example of our Statistical Drake equation with 
uniform entry distributions. So, we really make no 
assumption about the astronomy, or the biology, or 
the sociology of the Drake equation: we just care 
about its mathematics. 


All our assumed entries are given in Table 1. 


Please notice that, had we assumed all the 
standard deviations to cqual zere in Table 1, then our 
Statistical Drake equation (3) would have obviously 
reduced to the classical Drake equation (1), and the 
resulting number of civilizations in the Galaxy would 
have turned oul lo be 3500: 


N =3500|, (22) 


This is the important deterministic mimber that we 
will usc im the sequel of this paper for comparison 
with our statistical results on the mean value of N, 
i.e. {W). This will be explained in Sections 3.3 and 5. 
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Table 1. Input values (i.c. mean values and standard deviations) for the seven Drake uniform random variables D;. 
The first column on the left lists the seven input sheer numbers that also become the mean values (middle column). 
Finally the last column on the right lists the seven input standard deviations. The bottom line is the classical Drake 


equation (1). 


3.2 STEP 6: COMPUTING THE LOGS 
OF THE 7 UNIFORMY 
DISTRIBUTED DRAKE RANDOM 
VARIABLES D; 


Intuitively speaking, the natural log of a 
uniformly distributed random variable may aot be 
another uniformly distributed random yariable! This 
is obvious from the trivial diagram of y =In(x) 
shown below: 


Natural logarithm of x 


REAT. values of the natural log: y=In(x) 


POSITIVE independent variable x 


Figure 1. The simple function v = Infx). 
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Su. if we have a uniformly distributed random 
variable D; with lower limit @,and upper limit 5;, the 
random variable 


¥,=In(D,) i=1....,7 (23) 


mus! have its range limited in between the lower limil 
fn(a;) and the upper limit é#(bj}. In other words, this 
are the lower and upper limits of the relevant 
probability density function fy (vy). But what is the 
actual analytic expression of such a pdf?. To find it, 
we must resort to the general transformation law for 
random variables, dcfincd by equation (9). Here we 
obviously have 


y=alx)= In(x) (24) 
That, upon inversion, yields the single root 
x(y)=a(yJ=e". (25) 


On the other hand, differentiating (24) one gets 
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Pees and g (x,(y))= 


] | 
x aly) e* 


where (25) was already used in the last step. By 
virtue of the uniform probability density function 
(10) and of (26), the general transformation law (9) 
finally yields 


iv yp= fy (rdy))_ | oe, ee 
fy) die Col as -. 27) 


In other words, the requested pdf of ¥; is 


Probability density functions of the natural logs of 
all the uniformly distributed Drake random 
vartables Dj . 


This is indeed a positive function of y over the 
interval In(a, \< ys In(s, ), as for every pdf, and it is 
cusy lo see that its normalization condilion is 
fulfilled: 


(hy glnty, ) 


Joli J infh } ot ab 
fo febday= PY ay = a 
inde) * | 


nla), — a; b, —4 


(29) 


Next we want to find the mean value and 
standard deviation of ¥, , since these play a crucial 
role for fulure developments. The mean value (Y, ) is 


given by 


Inf, ) Inf.) yee" 
Yj= [ an Ndy= { = a 
( : In(u, " fy (vy) : | : 


ner, } b, — at; 


_ & lino, )- t-a;[In(e; )—1] | Bes 
b, - a; 
This is thus the mean value of the natural log of ail 


the uniformly distributed Drake random variables 
D; 


(¥,) = (in(0,)) = b, [In(o; )—1]—a;[ln(@,}—1] 


b; —a; 


» 31) 
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In order to find the variance also. we must first 
compute the mean value of the square of ¥,, that is 


‘ lafh, } : Inf, Fy? ge 
(v7) =| yoo fy (v)ay=| eddy 


nla, j ‘ Infu, ) b, ~ a; 


_} Im *{b,)- 2 In (b; )+ le tt; lm? (a, )- 2 in(e; )+ 2| 


b. — a; 
eS 2) 
The variance of Yi = In(Di) is now given by (2) 


minus the square of (31), that. after a few reductions, 
yield: 


a;b, [in {4, )-Infe; iF 


4 3 
Fy = Finin,y = 1- 3 (33) 
6-4) 
Whence the corresponding standard deviation 
a,b, [in (6, )-In{a; ia 
Oy, = Cay) = td) 


(b, - a; : 


Let us now turn to another topic: the use of 
Fouricr transforms, that, m probability theory, are 
called “characteristic functions,” Following again the 
notations of Papoulis (ref. [$]) we call “characteristic 
function”, @y (f) , of an assigned probability 
distribution Y; , the Fourier transform of the relevant 
probability density function, that is (with j= v—-1) 


(35) 


The use of characteristic functions simplifies things 
greatly. For instance, the calculation of all moments 
of a known pdf becomes trivial if the relevant 
characteristic function is known, and greatly 
simplified also are the proofs of important theorems 
of statistics, like the Central Limit Theorem that we 
will use in Section 4. Another important result 1s that 
the characteristic function of the sum of a finite 
number of independent random variables is simply 
given by the product of the corresponding 
characteristic functions. This is just the case we are 
facing in the Statistical Drake equation (3) and so we 
are now led to find the characteristic function of the 
random variable ¥; , i.e. 
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] Int, } (lige )}y = I . 1 [ets idhy ine } 


siete gS ay 


b, ei ina; } . b, —d; 14i¢ ne, } 


glltis Jin», } Ey 


7 (6; —a,)(1+ jg) 
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Thus, the characteristic function of the nafural log 
of the Drake uniform random variable Dj ix given by 


(37) 


3.3 STEP 7: FINDING THE 
PROBABILITY DENSITY 
FUNCTION OF N, BUT ONLY 
NUMERICALLY NOT 
ANALYTICALLY 


Having found the characteristic functions 
d, (¢ )} of the logs of the seven input random 
t 


variables D; . we can now immediately find the 
characteristic function of the random variable Y = 
In(¥) defined by (5). In fact, by virtue of (4), of the 
well-known Fourier transform property stating that 
“the Fourier transform of a convolution is the product 
of the Fourier transforms”, and of (37), it 
immediately follows that @,(¢) cquals the product 


of the seven @y (4 ): 


b [+ fy tis 


d-Tloxle Diem css . (38) 


The next step is to @vert this Fourier transform in 
order 1o get the probability density function of the 
random variable ¥Y = In(A). In other words, we must 
compute the following inverse Fourier transform 


[ee aes : 
frls)=5— fe ey (e) ae 


of igs Ile (f)\d¢ 
OR ae 


i-l 


‘ ~altt 


{ ~ oe 3 plliés 
a | pes Pe ae. (3s 
male Ig — td; 04 5S 5 |- - 
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This author regrets that he was unable to compute the 
last integral analytically. He had to compute it 
numerically tor the particular values of the 14 a; and 
b; that follow from Table | and equations 17. The 
result was the probability density function for Y = 
In(N) plotted in the following Figure 2. 


aa ROE. DENSITY FUNCTION OF Yzeln(N) 


BERGER SERENE 
TEE A TTT 
MERCY ER GE 
blll TAL, 


5 6 7 8 9 IDFE 12 
iia variable ¥ = In(N) 


Probability density function of Y 


Figure 2. Probability density function of Y = In(Q) 
computed numerically by virtue of the integral (39). 
The two “funny gaps” in the curve are due to the 
numeric limitations in the MathCad numeric solver 
that the author used for this numeric computation. 


We are now just one more step from finding the 
probability density of NN, the number of 
ExtraTerrestrial Civilizations in the Galaxy predicted 
by our Statistical Drake equation (3). The point here 
is to transter from the probability density function of 
¥ to that of A, knowing that Y = In(), or 
alternatively, that N=cxp(Y), as stated by (6). We 
must thus resort to the transformation law of random 
variables (9) by sctting 


= e{xjse*. (40) 
This, upon inversion, yields the single roat 
xi(y)=-y)=In(y). (41) 


On the other hand, differentiating (40) one gets 


g{xj=e" and ge x(y)J=e=y 42) 
where (41) was already used in the last step. The 
general transformation law (9) finally yields 


ef 


De fx Q; cae ie [ fr(in(y)). 43) 
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This probability density function ty (y) was 
computed numerically by using (43) and the numeric 
curve given by (39), and the result is shown in Figure 
3. 


4 ge oes BILITY DENSITY FUNCTION OF N 


vA 
i= 
zai 
3 
8 
= 9.107 
rar 
f 
5 1-10 
£ 
Ay 
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N = Number of ET Civilizations in Galaxy 


Figure 3. The numeric (and not analytic) probability 
density function curve fy, (v) of the number NV of 
ExtraTerrestrial Civilizations in the Galaxy according 
to the Statistical Drake equation (3). We sce that the 
curve peak (i.c. the mode) is very close to low valucs 
of N, but the tail on the right is high, meaning that the 
resulting mean value (N ) is of the order of 


thousands. 


We now want to compute the mean value (N ) 
of the probability density (43). Clearly, it is given by 


(V)= |» fa (v)ey. (44) 
0 


This integral too was computed numerically, and the 
result was a perfeet match with N=3500 of (22), that 
is 


(N) = 3499,99880 177509 + 0.00000012 49146861 (45) 


Note that this result was computed numerically in the 
complex domain because of the Fourier transforms, 
and that the real part is virtually 3500 (as expected) 
while the imaginary part is virtually zero because of 
the rounding errors. So, this result is excellent, and 
proves that the theory presented so far is 
mathematically correct. 


Finally we want to consider the standard 
deviation. This also had to be computed numerically, 


resulting in 


Oy, = 3953.42910 143389 +0.00000003 2800058: . (46) 
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This standard deviation, higher than the mean value, 
implies that N might range in between 0 and 7453. 


This completes our study of the probability 
density function of N if the seven uniform Drake 
input randam variable D; have the mean values and 
standard deviations listed in Table |. 


We conclude that, unfortunately, ever under the 
simplifying assumptions that the Di be uniformly 
distributed, if is impossible te solve the full problem 
analytically, since all calculations beyond equation 
(38) had fo be performed numerically. 


This is ne good. 


Shall we thus loose faith, and declare “impossible” 
the task of finding an analytic expression for the 
probability density function fy, (y) 7 


Rather surprisingly, the answer is “no™, and there 
is indeed a way out of this dead-end, as we shall sec 
in the next section. 


5. THE CENTRAL LIMIT THEOREM (CLT) 
OF STATISTICS 


Indeed there is a good, approximating analytical 
expression for f,, (v) , and this is the following 
lognormal probability density function 


nly ay 


e 3 (¥20)). (47) 
2a 


ty (y, Hs c) a 

To understand why, we must resort to what is 
perhaps the most beautiful theorem of Statistics: 
the Central Limit Theorem (abbreviated CLT). 
Hislorically, the CLT was in fact proven first in 
1901 by the Russian mathematician Alexandr 
Lyapunov (1857-1918), and later (1920) by the 
Finnish mathematician Jarl) Waldemar Lindeberg 
(1876-1932) under weaker conditions. These 
conditions are certainly fulfilled in the context of 
the Drake cquation because of the “reality” of the 
astronomy. biology and socivlogy involved with it, 
and we are not going to discuss this point any 
further here. A good, synthetic description of the 
Central Limit Theorem (CLT) of Statistics is found 
at the Wikipedia site (ref. [7]) to which the reader 
is referred for more details, such as the equations 
for the Lyapunov and the Lindeberg conditions, 
making the theorem “rigorously” valid. 
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Put in loose terms, the CLT states that, if one 
las @ sum of random variables even NOT 
identically distributed, this sum tends ta a normal 
distribution when the number of terms making up 
the sum tends to infinity. Alse, the normal 
distribution mean value ts the sum af the mean 
values af the addend random variables, and the 
normal distribution variance is the sum of the 
variances of the addend random variables. 


Let us now write down the equations of the CLT 
in the form needed to apply it to our Statistical Drake 
equation (3). The idea is to apply the CIT to the sum 
of random variables given by (4) and (5) whatever 
their probability distributions can possibly be. In 
other words, the CLT applied to the Statistical Drake 
equation (3) leads immediately to the following three 
equations: 
}) The sum of the (arbitrarily distributed) 
independent random variables ¥, makes up 
the new random variable Y. 

2) The sum of their mean values makcs up the 
new mean value of Y. 

3) The sum of their variances makes up the 

new variance of ¥. 


In equations: 


y=Sy, 


(Y, ) (48) 


oy = Do 


This completes our synthetic description of the CLT 
for sues of random variables. 


6. THE LOGNORMAL DISTRIBTION IS 
THE DISTRIBUTION OF THE NUMBER 
N OF EXTRATERRESTRIAL 


CIVILIZATIONS IN THE GALAXY 


The CLT may of course be extended to products 
of random variables upan taking the logs of both 
sides, just as we did in equation (3), it then follows 
that the exponent random variable, like Y¥ in (6), 
fends to a normal random variable, and, as a 
consequence, it follows that the base random 
variable, like N in (6), tends to a lognormal random 
variable. 


39 


To understand this fact better in mathematical 
terms consider again of the transformation law (9) of 
random variables. The question is: what is the 
probability density function of the random variable N 
in equation (6), that is, what 1s the probability density 
function of the lognormal distribution? To find it, set 


y=gtx}se*. (49) 
This, upon inversion, yiclds the sizgle root 
x{y)=4(y) = ny). (50) 
On the other hand, differentiating (49) one gets 
g(xj=e" and gi{xy(vel™=y (51 


where (50) was already used in the last step. The 
general transformation law (9) finally yields 


flatly ‘i 
ful DD TACE py fen). (52) 


Therefore, replacing the probability density on the 
right by virtue of the well-known normal (or 
Gaussian) distribution given by equation (7), the 
lognormal! distribution of equation (47) is found, and 
the derivation of the lognormal distribution from the 
normal distribution is proved. 


In view of future calculations, it is also useful to 
point out the so-called “Gaussian integral”, that is: 


E 
ae —AYe ve a 4) 
{ : eh dem | et, ASO: BE wal. 
TS, 


This follows immediately from the normalization 
condition of the Gaussian (7), that is 


(53) 


_fa-uF 
{ ee ae ean (54) 


1 Yara 


just upon expanding the square at the exponent and 
making the two replacements (we skip all steps) 


“a (55) 
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In the sequel] of this paper we shall denote the 
independent variable of the lognormal distribution 
(47) by a lower case letter n to remind the reader that 
corresponding random variable N is the positive 
integer number of ExtraTerrestrial Civilizations in 
the Galaxy. In other words, 7 will be treated as a 
positive reaf number in all calculations to follow 
because it is a “large” number {i.e. a continuous 
variable) compared to the only civilization that we 
know of, i.e. ourselves. In conclusion, from now on 
the lognormal probability density function of N will 
be written as 


(Intfn Rya}? 
l | 2c 


e (7 = 0) (56) 


fy (x) a ye : pa 


Having so said. we now turn to the statistical 
properties of the lognormal distribution (55). i.e. to 
the statistical properties that describe the number 
of ExtraTerrestrial Civilizations in the Galaxy. 


Our first goal is to prove an equation yielding all 
the moments of the lognormal distribution (56), that 
is, for every non-negative integer k ~ 0, 1, 2,2... one 
has 


(37) 


The relevant proof starts with the definition of the k- 
th moment 


One then transforms the above integral by 


virtue of the substitution 
In[r| oe (58) 


The new integral in z is then seen to 
reduce to the Gaussian integral (53) 
{we skip all steps here) and (57) 
follows 
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Upon setting &£=0 into (56), the 
normalization condition for f,, (21) follows 


fits (n)dn=1, (59) 


Upon setting k=1 into (56), the important 
mean value of the random variable N is found 


> 


x 


(N\ =e e? |, (60) 


Upon setting & =2 into (56), the mean valuc 
of the square of the random variable N is found 


(N*) =e? er . (61) 


The variance of N now follows from the last two 
formulae: 


(62) 


The square root of this is the important standard 
deviation formula for the N random variable 


(63) 


The third moment is obtained upon setting 
&k =3 into (56) 


(v3\ =e! oe?" ; (64) 


Finally, upon setting & =4, the fourth moment 
of N is found 


(n*) me4 eBe" , (65) 
Our next goal is to find the cumulants of N. In 
principle, we could compute all the cumulants K; 


from the generic i-th moment y, by virtue of the 
recursion formula (see ref. [%]) 


Ae es | 
K; = fl; -> ' ef K, My-k : (66) 
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In practice, however, here we shall confine 
ourselves to the computation of the first four 
cumulants only because they only. are required to 
find the skewness and kurtosis of the distribution. 
Then, the first four cumulants in terms of the first 
four moments read: 


i a 
Ky =, — Ky (67) 
Ky = 4, —-3K, K, - Kj 
Ky = 4, -4K, Ky-3 Ky -6K, Ky — Ky. 
These equations yield, respectively: 
gr 
K, =e"? (68) 
K; nett ev (2 1). (69) 
y = 
K,=ee? . (70) 


ya 
+ 


K,= ener (0 - i) (32° $32" 460% 4+ 6] (71) 
From these we derive the skewness 


K, 


(K,); 


= er + 2) + 7 
Clk ie Bf 332 ee ee 
. —TP le" +3¢°° +6e° +6 


(72) 


and the kurtosis 


ents ameete se. KG) 
(K2)° 


Finally, we want to find the mode of the 
lognormal probability densily function, i.e. the 
abscissa of its peak. To do so, we must first 
compute the derivative of the probability density 
function fy(n) of equation (56), and then set it 
equal to zero. This derivative is actually the 
derivative of the ratio of two functions of 7, as it 
plainly appears from (57). Thus, let us set for a 
moment 
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2 
F(n)= (mae l-w (74) 
2a7 
where “E™ stands for “exponent.” Upon 
differentiating this, one gets 


! 


20 


(75) 


E {n)= 


3 -2{Infx]- Ht): 


pT 


But the lognormal probability density function (36), 
by virtue of (74), now reads 


l eo El) 


i cal yo z 


(76) 
So that its derivative is 


fer visene (")_ se NE (n)-n-tee FY 


dr lao nH? 


pe EN (n) ned 


Qn n 


Setting this derivative equal to zero means setting 


(77) 


E (n)-n4 1=0 (78) 


That is, upon replacing (75). 
+. n{n]— x) + l=0. (79) 
ae 


Rearranging, this becomes 


Infn|- wt+o? =0 (80) 
and finally 


= a = ra (81) 


Unde peal 


This is the most likely number of ExtraTerrestrial 
Civilizations in the Galaxy. 


How likely? To find the value of the probability 
density function f,(#) corresponding to this 
value of the mode, we must obviously replace (81) 
into (56). After a few rearrangements, one then 
gets 
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This is “how likely” the most likely number of 


ExtraTerrestrial Civilizations in the Galaxy is, i.e. 
it is the peak height in the lognormal probability 
density function f,,(n). 


Next to the made, the median wr (ref. [9]) is one 
more statistical number used to characterize any 
probability distribution. It is detined as the 
independent variable abscissa mm such that a 
realization of the random variable will take up a 
value lower than i with 50% probability or a value 
higher than m with 50% probability again. In other 
words, the median wz splits up our probability 
density in exactly two equally probable parts. Since 
the probability of occurrence of the random event 
equals the arca under its density curve (i.c. the 
definite integral under its density curve) then the 
median wa (of the lognormal distribution, in this 
cusc) is defined as the integral upper Limi #7: 


Uinta ye y 


He (na a | 1 Is l (83) 
{, ty nda = | oe “5° By 


In order to find #1, we may net differentiate (83) with 
respect to wi, since the “precise” factor {4 on the 
right would then disappear into a zero. On the 
contrary, we may try to perform the obvious 
substitution 


5 re ee ey (84) 


into the integral (83) to reduce it to the following 
integral defining the error function erf(z) 


Probability density function 


Standard deviation 
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= number of communicating ET civil 


Probability distribution 


Mean value 
Variance 


enf (x)= - fea: (85) 
oan) 


Then, after a few reductions that we skip for the suke 
of brevity, the full equation (83) is turned into 


1 if infer)- yw) 
steq[ Bee \-4 (86) 


that is 


In{wn) — 42 
2Hf) —-—— |= 0 87) 
of Be | 


Since from the definition (85) one obviously has 
ert(0)=0, (87) becomes 


ln (on) — ys =0 (88) 


V20 


[median =m =e] (89) 


This is the median af the lognormal distribution af 
N. din other words, this is the number of 
ExtraTerrestrial civilizations in the Galaxy such 
that, with 50% probability the actual value af N will 
be lower than this median, and with 50% probability 
it will be higher. 


whence finally 


In conclusion, we feel useful to summarize all the 
equations that we derived about the random variable 
N in the following Table 2. 


izations in Galaxy 


Expression of in terms of the lower (a;) and upper 
(b,) lunits of the Drake uniform inpul random 
variables D; 

Expression of o? in terms of the lower (a,) and upper 
(;) limits of the Drake uniform input random 
variables D; 


= = pt ooo 
Ayode = Ayeak =e e 


_ a,b, [in (6, )—In(a, ie 
(b, — a; i 


Table 2. Summary of the properties of the lognormal distribution that applies to the random variable N = number of 


ET communicating civilizations in the Galaxy. 


We want to complete this section about the 
lognormal probability density function (56) by 
finding out its mumeric values for the inputs to the 
Statistical Drake equation (3) listed in Table 1. 


According to the CLT, the mean value gy: to be 
inserted into the lognormal densily (56) is given 
(according to the second equation (48)) by the sum of 
all the mean values (Y, ) . that is, by virtue of (31), by: 


A = x lin(@; )- =a; [in(a; )- i] (90) 


Upon replacing the 14 a; and 6, listed in Table 1 
into (90), the following numeric mean value jz is 
found 


1 


Similarly, to get the numeric variance o? one 
must resort to the last of equations (48) and to (33): 


. 2 
Pe oF a Sy _ ab; [in (, }— In(a, I (92) 


7 
=I 


(b, -a;) 


q 
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yiclding the following numeric variance o” to be 
inserted into the lognormal pdf (56) 


co? = 1.938725 (93) 


whence the numeric standard deviation o 


oF = 1.392381 |, (94) 


Upon replacing these two numeric values (84) 
and (86) into the lognormal pdf (36), the latter is 
perfectly determined. It is plotted in Figure 4 
hereafter as the thin curve. 


In other words, Figure 4 shows the lognorinal 
distribution for the number N of ExtraTerrestrial 
Civilizations in the Galaxy derived from the Central 
Limit Theorem as applied to the Drake equation 
(with the input data listed in Table I). 


We now like to point out the most important 
statistical properties of this lognormal pdf: 


1) Mean Vaiue of N. This is given by cquation (60) 
with gsand o given by (91) and (94), respectively: 
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(95) 


In other words, there are 4590 ET Civilizations in 
the Galaxy according the Central Limit Thearem of 
Statistics with the inputs of Table I. This number 
4590 is HIGHER than the 3500 foreseen by the 
classical Drake equation working with sheer 
numbers only, rather than with probability 
distributians. Thus equation (95) [IS GOOD FOR 
NEWS FOR SETI, since it shows that the expected 
number of EVs is HIGHER with an adequate 
statistical treatment than just with the too simple 
Drake sheer numbers of (1). 


2) Variance of N. The variance of the lognormal 
distribution is given by (62) and turns out to be a 
huge number: 


af se em (eo - i} 125328623. (96) 


3) Standard deviation af N. The standard deviation 
of the lognormal distribution is given by (63) and 
turns oul to be: 


» 


oe 
gy, =e e+ Ve™ -1=11195 |. (97) 


Again, this is GOOD NEWS FOR SETI. In fact, 
such a high standard deviation means that N may 
range from very low vatues (zero, theoretically, and 


one since Humanity exists} up to tens of thousands 
(4590+11195=15785 is (95)+(97}). 


4) Mode of N. The mode (= peak abscissa) of the 
lognormal distribution of N is given by (81), and has 
a surprisingly low numeric value: 


n 


risile 


= Nou =e e o 250 | (98) 


This ts well shown in Figure 4: the made peak is very 
pronounced and close to the origin, but the right tail 
is high, and this means that the mean value of the 
distribution is much higher than the mode: 
4590>>250. 
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5) Median of N. The median (= fifty-fifty abscissa, 
splitting the pdf in two exactly equi-probable parts) 
of the lognormal distribution of N is given by (89), 
and has the numeric value: 


Minedian = ef = 1740 (99) 


In words, assuming the input values listed in Table 1, 
we have exactly a 50% probability that the actual 
value of Nis lower than 1740, and 50% that it is 
higher than 1740. 


7. COMPARING THE CLT RESULTS 
WITH THE NON-CLT RESULTS 


The time is now ripe to compare the CLT- 
based results about the lognormal distribution of WV, 
just described in Section 5, against the Non-CLT- 
based results obtained numerically in Section 3.3 


To do so in a simple, visual way, let us plot on 

the same diagram two curves: 

lL) The numeric curves appearing in Figure 2 
and gblained after laborious Fourier 
transform calculations in the complex 
domain, and 

2) The lognormal distribution (56) with 
humeric gg and o given by (91) and (94) 
respectively. 


We see that the two curves are virtually coincident 
for valucs of N larger than 1500. This is a 
consequence of the law of large numbers, of which 
the CLT is Just one of the many facets. 


Similarly it happens for natural log of N, 1c. the 
random variable ¥ of (5), that is plotted in Figure 5 
both in its normal curve version (thin curve) and in 
ils numeric version, obtained via Fourier transforms 
and already shown in Figure 2. 


The conclusion is simple: from now on we shall 
discard forever the numeric calculations and we'll 
stick only to the equations derived by virtue of the 
CLT, ie. ta the lognormal (56) and its 


CORSEGUENCES. 
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= PROBABILITY DENSITY FUNCTION OF N 


Prob. density function of N 


N = Number of ET Civilizations in Galaxy 


Figure 4. Comparing the two probability density functions of the random variable ' found: 
1) Althe end of Section 3.3. in a purcly numeric way and without resorting to the CLT at all (thick curve) and 
2) Analytically by using the CLT and the relevant lognormal approximation (thin curve). 


PROBABILITY DENSITY FUNCTION OF Y=lnt(NX} 
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Probability density function of Y 
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SARE) 408.\00 
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Independent variable ¥ = In(N} 


Figure 5. Comparing the two probability density functions of the random variable Y=In(N) found: 
1) Al the end of Scction 3.3. in a purely numeric way and withoul resorting lo the CLT at all (thick curve) and 
2) Analytically by using the CLT and the relevant normal (Gaussian) approximation (thin Gaussian curve). 


8 DISTANCE OF THE NEAREST 
EXTRATERRESTRIAL CIVILIZATION 
AS A PROBABILITY DISTRIBUTION 


As an application of the Statistical Drake 
Equation developed in the previous sections of this 
paper, we now want to consider the problem of 
estimating the distance of the ExtraTerrestrial 
Civilization nearest to us in the Galaxy. In all 
Astrobiology textbooks (sec, for instance, ref. [10]) 
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and in several web sites, the solution to this 
problem is reported with only slight differences in 
the mathematical proofs among the various authors. 
In the first of the coming two sections (section 7.1) 
we derive the expression for this “ET_Distanee” 
(as we like to denote it} in the classical, non- 
probabilistic way: in other words, this is the 
classical, deterministic derivation. In the second 
section {7.2} we provide the probabilistic 
derivation, arising from our Statistical Drake 


UNCLASSIFIED / /POT?OPPECGEHEUGE-Ohir+ 


UNCLASSIFIED/ 


Equation, of the corresponding probability density 
function fer pisane() : here r is the distance 
between us and the nearest ET civilization 
assumed as the independent variable of its own 
probability density function. The ensuing sections 
previde mere mathematical details about this 
Jrripistane(*) such as ifs mean value, variance, 
standard deviation, all central moments. mode, 
median, cumulants, skewness and kurtosis. 


CLASSICAL, NON-PROBABILISTIC 
DERIVATION OF THE DISTANCE OF THE 
NEAREST ET CIVILIZATION 


Consider the Galactic Disk and assume that: 

1) The diameter of the Galaxy is (about) 100,000 
light years, (abbreviated ly) i.e. its radius, 
Roatayw+ 18 about 30,000 ly. 

2) The thickness of the Galactic Disk at half-way 
from its center, Re wags is about 16,000 ly. 


Then 

3) The volume of the Galaxy may be 
approximated as the volume of the 
corresponding cylinder, i.e. 


Vesate vy mis Re), Y h ( 100) 


4) Now consider the sphere around us having a 
radius r. The volume of such as sphere is 


Vou Spies =r 3 z D 


4 — 
In the last equation, we had to divide the distance 
“ET_Distance” between ourselves and the nearest 
ET Civilization by 2 because we are now guing to 
make the unwarranted assumption that ail ET 
Civilizations are equally space from each other in 
the Galaxy! This is a crazy assumption, clearly, 
and should be replaced by more scientifically- 
grounded assumptions as soun as we know more 
about our Galactic Neighbourhood. At the moment, 
however, this is the best guess that we can make, 
and so we shall take it for granted, although we are 
aware that this is weak point in the reasoning. 


Having thus assumed that ET Civilizations 
are UNIFORMLY SPACED IN THE GALAXY, 
we can write down this proportion: 
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) (101) 


¥, ii 7 Vv _S, ia 
Gatlarvy fae Cher_ Spire (102) 
N I 
That is, upon replacing bath (100) and (101) into 
(102): 


4 gaa 

Py Re i 4 

iT Crerleaxs ais 3 \ = (103) 
he I 

The only urnknewr in’ the last equation is 

ET_Distance, and se we may solve for it, thus 

getting ihe: 

(AVERAGE) DISTANCE BETWEEN ANY PAIR 

OF NEIGHBOURING CIVILIZATIONS IN 

THE GALAXY 


3 
ET_Distance = 


. 


where the positive constant C is defined by 


C= 36 Revatucy Meaiay: * 28845. light years | (105) 


Equations (104) and (105) are the starting point [or 
our first application of the Statistical Drake 
equation, that we discuss in detail in the coming 
sections of this paper. 


PROBABILISTIC DERIVATION OF THE 
PROBABILITY DENSITY FUNCTION FOR 
ET_DISTANCE 


The probability density function (pdf) yielding 
the distance of the ET Civilization nearest to us in 
the Galaxy and presented in this section, was 
discovered by this author on September 5", 2007. 
He did nol disclose it to other scientists until the 
SETI meeting run by the famous mathematical 
physicist and popular science author, Paul Davies, 
at the “Beyond” Center of the University of 
Arizona at Phoenix, on February 5-6-7-8, 2008. 
This mecting was also attended by SETI Institute 
experts Jill Tarter, Seth Shostak, Doug Vakoch, 
Tom Pierson and others. Durimg this author’s talk. 
Paul Davies suggested to call “the Maccone 
distribution” the new probability density function 
that yields the ET_Distance and is derived in this 
section. 
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Let us go back to equation (104). Since N is 
now a random variable (obeying the lognormal 
distribution), it follows that the ET_Distance must 
be a random variable as well. Hence it must have 
some unknown probability density function that 
we denote by 


Ficrpistane 1) ( 106) 


where ris the new independent variable of such a 
probability distribution (it is denoted by r to 
remind the reader that it expresses the three- 
dimensional radial distance separating us from the 
nearest ET civilization in a full spherical symmetry 
of the space around us). 


The question then is: what is the unknown 
probability distribution (106) of the ET_Distance? 
We can answer this question upon making the two 
formal substitutions 


N Pt 
wap (107) 


EV_distance + y 


into the transformation law (8) for random 
variables. As a consequence, (104) takes form 


=Cox 3, (108) 


A 


y= eQ= 2 
yo! =e 


In order to find the unknown probability density 
fer_pisane(7) . We now to apply the rule (9) to 


(108). First. notice that (108), when inverted to 
yield the various roots x,{y}, yields a single real 
root only 


xQy)=—z- (109) 


Then, the summation in (9) reduces to one term 
only. 
Second, differentialing (108) one finds 


(110) 


Thus, the relevant absclute value reads 
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el} So ae (111) 


Upon replacing (111) into (9), we then find 


le (my = 


Cc. = 
3 °° 3 
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we he 
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This is the denominator of (9). The numerator 
simply is the lognormal probability density 
function (56) where the old independent variable x 
must now be re-written in terms of the new 
independent variable ¥ by virtue of (109), By 
doing so, we finally arrive at the new probability 
density function f(s) 


y 
coy 1 ; 
f= ee 
’ 40 fore 
3 
re 


Rearranging and replacing y by r, the final form 
is: 


SEP distane {r)= 


a 
: 


Now. just replace C in (113) by virtue of (105). 
Then: 


We have discovered the probability density 
function yielding the probability of finding the 
nearest’ ExtraTerrestrial Civilization in the 
Galaxy in the spherical shelt between the 
distances r and r+dr from Earth: 


o Ren fons Hein feeds 
Ia) —————— te 
r . 


* 


Ie 


3 
Set piseme (1) = —: 
PET_Di tang: A Jiro 


(tla) 
holding for r=0. 


STATISTICAL PROPERTIES OF THIS 
DISTRIBUTION 
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We now want to study this probability 
distribution in detail. Qur next questions are: 

1} What is its mean value? 

2) What are its variance and standard 
deviation? 

3) What are ils moments to any higher order? 

4) What are its cumulants? 

5) What are its skewness and kurtosis? 

6) What are the coordinates of its peak, ie. 
the mode (peak abscissa) and its ordinate? 

7) What is its median’? 


The first three points in the list are all covered 
by the following theorem: all the moments of (113) 
are given by (here & is the generic and non- 
negative integer exponent, i.e. k = 0,1,2,3,... 20) 


i 
(ET_Distance‘ ) = [ r*. FET_Distane (r) dr 


« 3 1 “ 2 
kos ty 
=|] ff ---—==—-e dr 
I r Jiro 
f apt pe 
=C™e *¢ 3, (115) 


To prove this result, one first transforms the above 
integral by virtue of the substitution 


3 
Wl fee (116) 
r 


Then the new integral in z is then seen to reduce to 
the known Gaussian integral (53) and, afler several 
reductions that we skip for the sake of brevity. 
(115) follows from (53). In other words. we have 
proven that 


(117) 


(ET_Distance* | =C* ¢ 


Upon selling k=0 into (117), the 
normalization condition for fpz pisiane(7) follows 


yf ET Distane (r) dr=1, (1 18) 
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Upon setting k=1 into (117), the important 
mean value of the random variable ET_Distance 
is found 


rr oa 


(ET_Distance) =Ce Pel] (119} 


Upon setting & =2 into (117), the mean value of 
the square of the random variable ET_Distance is 
found 


2 Box, 


i 
(ET_Distance”) = Ce pt 5 (120) 


The variance of ET_Distance now follows from 
the last two formulae with a few reductions: 


, 2 
Or Disaie = (ET. Distance” } ~{ET_Distance) 


2 fas a 
3 


Sele® etl. 21) 


So, the variance of ET_Distance is 


The square root of this is the important 
standard deviation of the ET_Distance random 
variable 


Orr pistane ~E¢ - —1ih (123) 


The third moment is obtained upon setting 
&=3 into (117) 


be | a, 


(ET. _Distance*) =Cie Ke (124) 


Finally, upon setting k =4 into (117), the fourth 
moment of ET_Distanee is found 


4 & x 
Tre, ee bs ON > 
ET_Distanee™j}=C°e 3? e : (125) 
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Our next goal is to find the cumulants of the 
ET_Distance. In principle, we could compute all 
the cumulants XK, from the generic i-th moment 


H, by virtue of the recursion formula (see ref. [8]) 


icl /; 
, i-l 1 
K.-H (15) Ky. Hyp (126) 


In practice, however, here we shall confine 
ourselves to the computation of the first four 
cumulants because they only are required to find 
the skewness and kurtosis of the distribution (113). 
Then, the first four cumulants in terms of the first 
four moments read: 


Ky =H, 
Ry =i — Ky (127) 
Ky = fy -3K, K,-Kj 
Ky = it, -4K, Ky -3.K3 -6K, Ki -K}. 
These equations yield, respectively: 
He 
Ky=Ce Fe! (128) 
2a of of 
K,=CPe J e%le® -1]. (129) 
o So" o 
K,=C’e“le? -3e¢' 42e6 |. (130) 
Ki = (3) 
auf &o io? 40° o 26° 


=Cte FJe% -4e% -3¢ % +12e% -6e ® 


From these we derive the skewness 


K, 


3 


(K,) 


3 
se 
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‘ord e% -4¢% —32g 9 +l2¢3 ~Ge ? 


.. (132) 
and the kurtosis 
K ach 2" 
4 =e 9 42¢443¢ 9% -6, (133) 
(K)° 


Next we want to find the mede of this 
distribution, i.e. the abscissa of its peak. To do so, 
we must first compute the derivative of the 
probability density function fier pisne(r) of (113), 
and then set it equal to zero. This derivative is 
actually the derivative of the ratio of two functions 
of ¢, as its plainly appears from (113). Thus, let us 
set for a moment 


cy] y 
Be 
AG ae se ee (134) 


where “E” stands for “exponent,” Upon 
differentiating, 
one gets 


~2[] Sa peat. (135) 
eae a Oe i 


But the probability density function (113) now 
reads 


> age 


Act pisune (r) = ance ( | 36) 


So that its derivative is 


afer Ditiaw 4) = 3 _ 


dr V270 re 
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(137) 


Setting this derivative equal to zcro means setting 
Ef{r)-rt+1=0 (138) 


That is, upon replacing (135) into (138), we get 


Spa} cadens =0 (139) 
r is 


Rearranging, this becomes 


c \ 
-3 Wf ue =() (140) 
fe 
that is 
Cc ae 
-10] 340 = (141) 
i 
whence 
Cl] nw a& 
Inj —} =— + — 142 
nl ¢| a) ie 


and finally 


(143) 


This ts the most likely ET_Distance from Earth. 


How likely ? 

To find the value of the probability density 
function fir iistine (*) Corresponding to this value 
of the mode, we must obviously replace () into (). 
After a few rearrangements, which we skip for the 
sake of brevity, one gets 


Peak Value of Air pistane (7) = Fict_pistine CHinde } 


3 3 18 


aoe “¢ 
cy2ne 


50 


...(144) 
This is the peak height in the pdf fey yiaane(")- 


Next lo the mode, the median #7 (ref. [9]) is one 
more statistical number uscd to characterize any 
probability distribution. It is defined as the 
independent variable abscissa i such that a 
realization of the random variable will take up a 
valuc lower than m with 50% probability or a value 
higher than mz with 50% probability again. In other 
words, the median #7 splits up our probability 
density in exactly two equally probable parts. Since 
the probability of occurrence of the random event 
equals the arca under its density curve (ic. the 
definite integral under its density curve) then the 
median mm (of the lognormal distribution, in this 
case) is defined as the integral upper limit #7: 


(145) 


Np 


FSS . 
{ fer Dictane (rer - 


Upon replacing (113), this becomes 


m3 ] 2o° ] 
ee ee | = =—, 146 
{ € 5 (146) 


In order to finda, we may not differentiate (146) 
with respect to m, since the “precise” lactor 4 on the 
night would then disappear into a zero. On the 
contrary, we may Uy lo perform the obvious 
subsuilution 


«3 - 
mn © |-0 
E 
gates -:20 (147) 


into the integral (146) to reduce it to the following 
integral (85) defming the error function erf(z). Then, 
after a few reductions that we Icave to the reader as 
an cxereisc, the full cquation (145). defining the 
median, is turned into the corresponding cquation 
involving the error function erf(x) as defined by (85): 
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ET_Distance between any two neighboring ET 
Random variable Civilizations in Galaxy assuming they are UNIFORMLY 
distributed throughout the whole Galaxy volume. 


Probability distribution Unnamed (Paul Davies suggested “Macconc distribution”) 
2 7 
4 RGais ty Neuduas 
3 ~ ft 


Probability density function 


(Defining the positive numeric constant C) 


Mean value 


Variance 


Standard deviation 


All the moments, ic. &-ih moment 


Mode (= abscissa of the probability density function Funde = Tak = Ce te 


peak) 
Peak Value of fet pistune ') = 


dt a 


-e3 “@ ts 


Value of the Mode Peak ay ae 3 
JE Distong Vinnde CV2n60 


Median (= _ fifty-fifty probability value tor ect cae 
ET_Distance) median = 91 = Ce 


Skewness 


~ * 3" 
Io o 2a- \2 


3 3 
Cle ® -4¢ 9 ~3¢ 9 +12¢3 ~—6e ” 


Kurtosis 


! 


Expression of jin terms of the lower (a;) and upper b,[in(b, )—1]- a, [In(@, }-1] 
(;) limits of the Drake uniform input random 
vatiables D; 

: ae AD ‘ , . 2 
Expression of o% in terms of the lower (a,) and upper ee a,b, [In(d,)- In(a, }I 
(b,) limits of the Drake uniform input random i (b, - a, 

variables D; 


Table 3. Summary of the properties of the probability distribution that applies to the random variable ET_Distance 
yielding the (average) distance hetween any two neighboring communicating civilizations in the Galaxy. 
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that is 


ons 
In} © |-“ 
AW 


er | ——=——_- |= 0 (149) 


J2o 


Since from the definition (147) one obviously has 
erf(0)=0, (149) yields 


C 3 
If - 4 
m 


V2o 


whence finally 


563-10 -" 


45478 


338-107 


22510 


Probability density function (1 Ameters} 


1.13-10°7 


(148) 


=0 (150) 


(151) 


0 S00 1000 1500 2000 
ET_Distance from Earth {light years) 


This is the median of the lognormal distribution of 
N. in other words, this is the number of 
extraterrestrial civilizations in the Galaxy such 
that, with 50% probability the actual value of N will 
be lower than this median, and with 50% probability 
it will be higher. 

In conclusion, we feel useful to summarize all the 
equations that we derived about the random variable 
N in the tollowing Table 2, 


NUMERICAL EXAMPLE OF THE 
ET_DISTANCE DISTRIBUTION 


In this section we provide a numerical 
example of the analytic calculations carried on so 
far. 


Consider the Drake Equation values reported 
in Table 1. Then, the graph of the corresponding 
probability density function of the nearest 
ET_Distance, frp pistane (7), is shown in Figure 6. 


DIST ANCE OF NEAREST ET_CIVILIZA TION 


2500 3000 3500 4000 4500 5000 


Figure 6. This is the probability of finding the nearest ExtraTerrestrial Civilization at the distance r from 
Earth (in light years) if the valucs assumed in the Drake Equation are those shown in Table |. The relevant 
probability density function fiz pistane(”) 18 given by equation (113). Its mode (peak abscissa) equals 1933 


light years, but its mean value is higher since the curve has a high tail on the right: the mean value equals in 
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fact 2670 light years. Finally, the standard deviation equals 1309 light years: THIS IS GOOD NEWS FOR 
SETI, inasmuch as the nearest ET Civilization might lie at just 1 sigma = 2670-1309 = 1361 light years 


from us. 


From Figure 6, we see thal the probability of 
finding ExtraTerrestrials is practically zero up to a 
distance of about 500 light years from Earth. Then 
it starts increasing with the increasing distance 
from Earth, and reaches its maximum at 


hos 


= ar ay panes 
Avode = peak Ce & = 1933 light yCars). 


(152) 


This is the MOST LIKELY VALUE of the 
distance at which we can expect to find the 
nearest ExtraTerrestrial civilization. 


It is mot, however, the mean value of the 
probability distribution (113) for Alsy pistaue(?)- In 
fact, the probability density (113) has an infinite 
tail on the righi, as clearly shown in Figure 6, and 
hence its mean value must be higher than its peak 
value. As given by (119), its mean value is 


a 
#7 


Fnwan_value = Ce * e'* = 2670 light years}. 


(153) 


This is the MEAN (value of the} DISTANCE 
at which we can expect to find ExtraTerrestriais. 


After having found the above two distances (1933 
and 2670 light years, respectively), the next natural 
question that arises is: “what is the range, forth and 
back around the mean value of the distance, within 
which we can expect to find ExtraTerrestrials with 
“the highest hopes ?,” The answer to this question 
is piven by the notion of standard deviation. that 
we already found to be given by (123) 


x ? 
os om 


=Ce 2¢)8 Ve? —1 1309 light years), 
154) 


Oey pistane 


More precisely, this is the so called 1-sigma 
(distance) level. Probability theory then shows that 
the nearest ExtraTerrestrial civilization is expected 
to be located within this range, i.e. within the two 
distances of (2670-1309) = 136] light years and 
(2670+1309) = 3979 light years, with probability 
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given by the integral of fet pistane(7) taken in 
between these two lower and upper limits, that is: 


397 9lightyvars 
i fet Distane (r) dr=0.75 = 75% (155) 


36/ightyears 


In plain words: wilh 75% probability, the nearest 
ExtraTerrestrial civilization is located in between 
the distances of 1361 and 3979 light years from us, 
having assumed the input values to the Drake 
Eguation given by Table 1. If we change those 
input values, then all the numbers change again. 


9. THE “DATA ENRICHMENT 
PRINCIPLE” AS THE BEST CLT 
CONSEQUENCE UPON THE 
STATISTICAL DRAKE EQUATION 
(ANY NUMBER OF FACTORS 
ALLOWED) 


As a fitting climax to all the statistical 
equations developed so far, let us now state our 


“DATA ENRICHMENT PRINCIPLE,” It simply states that 


“The Higher the Number of Factors in the 
Statistical Drake equation, The Better,” 


Put in this simple way, it simply looks like a 
new way of saying that the CLT lets the random 
variable ¥ approach the normal distribution when 
the number of terms in the sum (4) approaches 
infinity. And this is the case, indeed. However, our 
“Data Enrichment Principle” has more profound 
methodological consequences that we cannot 
explain now, but hope to describe more precisely 
in one or more coming papers. 


CONCLUSIONS 


We have sought to extend the classical Drake 
equation to let it encompass Statistics and 
Probability. 


This approach appears to pave the way to 
fulure, more profound investigations intended not 
only to associate “error bars” to each factor in the 
Drake equation, but especially to increase the 
number of factors themselves. In fact, this seems to 
be the only way to incorporate into the Drake 
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equation more and more new scientific information 
as soon as it becomes available. In the long run, 
the Statistical Drake equation might just become a 
huge computer code, growing up in size and 
especially in the depth of the scientific information 
it contained. It would thus be Humanity’s first 
“Encyclopaedia Galactica,” 

Unfortunately, to extend the Drake equation to 
Statistics, it was necessary to use a mathematical 
apparatus that is more sophisticated than just the 
simple product of seven numbers, 


When this author had the honour and privilege 
to present his results at the SETI Institute on April 
11", 2008, in front of an audience also including 
Professor Frank Drake, he felt he had to add these 
words: “My apologies, Frank, for disrupting the 
beautiful simplicity of your equation,” 
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